Saturday, March 19, 2011

Update on Ruminate

It’s been a couple weeks, but I wanted to say a little bit about the Feb 26 Release of Ruminate.

Who should be interested in Ruminate?

This release is close to level of refinement and capabilities necessary for use in an operational environment. Ruminate will be useful for people who are willing to spend extensive effort integrating their own network monitoring tools. I doubt very few people will want to use it in exactly how it is out of the box, but many of the components or even the whole framework (with custom detections running on top) may be useful to others. Ruminate as currently constituted is not for those who want a simple install process. Ruminate doesn’t do alerting or event correlation. It is up to the user to integrate Ruminate with an external log correlation and alerting system.

The Good

I think the Ruminate architecture is very promising. It makes some things that are very hard to do in conventional NIDS look very easy. The following diagram shows the layout of the Ruminate components:

If you are totally new to Ruminate, I still suggest reading the technical report.

The improved HTTP parser is pretty groovy. I’m really pleased with the attempts I’m making at things like HTTP 206 defrag. I think my HTTP log format, which includes compact single character flags inspired by network flow records (e.g. argus), is pretty cute.

Since I haven’t documented it anywhere else, let me do it here. The fields in the logs (with examples) are as follows:

Jan 12 01:47:39 node1 http[26350]: tcp-198786717-1294814857-1294814859-c-33510- 1.1 GET /~tr-admin/papers/GMU-CS-TR-2010-20.pdf 0 32768 206 1292442029 application/pdf TG ALHEk - "zh-CN,zh;q=0.8" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.224 Safari/534.10" "Apache"

Transaction ID: tcp-198786717-1294814857-1294814859-c-33510-
Request Version: 1.1
Request Method: GET
Request Host:
Request Resource: /~tr-admin/papers/GMU-CS-TR-2010-20.pdf
Request Payload Size: 0
Response Payload Size: 32768
Response Code: 206
Response Last-Modified (unix timestamp): 1292442029
Response Content-Type: application/pdf
Response Flags: TG
Request Flags: ALHEk
Request Referer:
Request X-Forwarded-For: -
Request Accept-Language (in quotes): "zh-CN,zh;q=0.8"
Request User-Agent (in quotes): "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.224 Safari/534.10"
Response Server (in quotes): "Apache"

The Request Flags are as follows:

C => existence of "Cookie" header
Z => existence of "Authorization" header
T => existence of "Date" header
F => existence of "From" header
A => existence of "Accept" header
L => existence of "Accept-Language" header
H => existence of "Accept-Charset" header
E => existence of "Accept-Encoding" header
k => "keep-alive" value in Connection
c => "close" value in Connection
o => other value in Connection
V => existence of "Via" header

The Response Flags are as follows:

C => existence of "Set-Cookie" header
t => existence of "Transfer-Encoding" header, presumably chunked
g => gzip content encoding
d => deflate content encoding
o => other content encoding
T => existence of "Date" header
L => existence of "Location" header
V => existence of "Via" header
G => existence of "ETag" header
P => existence of "X-Powered-By" header
i => starts with inline for Content-Disposition
a => starts with attach for Content-Disposition
f => starts with form-d for Content-Disposition
c => other Content-Disposition

While not standard in any way, this log format should be very useful for my research.

The Bad

Ruminate is rough. It’s nowhere near the level of refinement of the leading NIDS. This is not likely to change in the short term.
Ruminate is based on a really old version of vortex. There are lots of reasons this isn’t optimal but the biggest issue is performance on high speed networks. Soon I’ll release a new version that is either based on the latest version of vortex or one that is totally separate from, but dependent on, vortex.

Yara Everywhere

I’ve added yara to the basically every layer of Ruminate. This is useful for those in operational environments because many people are used to and have existing signatures written for yara. Since Ruminate is very object focused (not network focused), yara makes a lot of sense. While applying signatures to raw streams is not what Ruminate is about, it was easy to do and may even be useful for environments struggling with limitations in signature matching NIDS. Lastly, the use of yara, with its extensive meta-signature rule definitions, helps fill a gap in Ruminate which can’t reasonably be filled by an external event correlation engine.

Ruminate or Razorback (or both)

I’ve been asked, and it’s a good question, how Ruminate and Razorback compare. Before I express my candid opinions, I want to say that I’m very pleased with what the VRT guys are doing with Razorback. While there is some overlap in what I’m doing and what they’re doing (at least in high level goals), there’s more than enough room for multiple people innovating in the network payload object analysis space. If nothing else, the more people in the space, the more legitimate the problem of analyzing client payload objects (file) becomes. It seems unfathomable to me, but there are many who still question the value of using NIDS for detecting attacks against client applications (Adobe, IE) versus the traditional server exploits (IIS, WuFTP) or detection of today’s reconnaissance (google search) versus old school reconnaissance (port scan).

To date, Ruminate’s unique contributions are very much focused on scalable payload object collection, decoding, and reconstruction. Notable features include dynamic and highly scalable load balancing of network streams, full protocol decoding for HTTP and SMTP/MIME, and object fragment reassembly (ex. HTTP 206 defrag). If you want to comprehensively analyze payloads transferred through a large network, Ruminate is the best openly available tool for exposing the objects to analysis. The actual object analysis is pretty loose in Ruminate today, but is definitely simple and scalable. Ruminate’s biggest shortcoming is its rough implementation and relatively low level of refinement. This isn’t a problem for academia and other research, but it is a barrier to widespread adoption.

Razorback is largely tackling the other end of the problem—what to do once you get the objects off the network (or host or other source for that matter). Razorback has a robust and well defined framework for client object analysis. While definitely in early beta state, Razorback is a whole lot more refined and “cleaner” than Ruminate. Razorback has centrally controlled object distribution model which has obvious advantages and disadvantages over what Ruminate is doing. Razorbacks’ limitations in network payload object extraction are inherited largely from it’s reliance on the Snort 2.0 framework, which to be fair, was never designed for this sort of analysis.

While I’ve never actually done it, if there was a brave soul who wanted to combine the best of both Ruminate and Razorback, it would be possible to use Ruminate to extract objects off the network and use Razorback to analyze the objects. Using the parlance of both respectively, one could modify Ruminate’s object multiplexer (object_mux) to be a collector for Razorback. The point I'm trying to make is that the innovations found in Ruminate and Razorback may be more complimentary than competing.

Take what you want (or leave it)

I’m sharing what I’ve implemented in hopes that it helps advance academic research and the solutions used in industry. Please take Ruminate as a whole, some components, or simply the ideas or paradigm and run with them. I’m always interested in hearing feedback on Ruminate or the ideas it advances. I’m also open to working with others on research using or continued development of Ruminate.

No comments:

Post a Comment