Splitting Vortex and Ruminate Server
Previous versions of Ruminate have been based on essentially a fork of Vortex. Now Ruminate relies on Vortex (or some similar thing like tcpflow) to generate network streams and Ruminate takes it from there. This allows Ruminate to benefit immediately from any updates to Vortex and better fits the implementation paradigm I've chosen for Ruminate (loose composition of many small components). This also allows for a single instance of of the stream capture mechanism (instead of one per protocol).
The new architecture looks like:
Now you start the stream distribution mechanism on the capture server with something like:
vortex {options} | ruminate_server
Note that ruminate_server doesn't take the same options as the old one. I haven't yet decided how I want to specify some of the options (like which streams get classified as which protocol and which port those streams are distributed on) so these are set in the code. In the future, I hope to make this much more flexible, allowing for protocol selection to be based not only on port, but also on content. Right now, streams are processed by the first, and only the first, protocol parser whose filter is matched by the stream. In the future, I'd like to support more than one, probably by giving a copy to each parser that wants the stream.
Significant Fix to http_parser
Those who have used Ruminate extensively will know that it occasionally comes across a stream that just kills the performance of http_parser. It's not that big deal if one of many http_parsers churns for a long period of time if you have a lot of them, but it's clearly not ideal. From what I can tell, the major cause of this situation is inefficient code in http_parser in the case of an HTTP response that doesn't include a Content-Length header. I've put in a fix for this that provides orders of magnitude improvement in this case, especially if the response payload is large.
Going Forward
There are a few things I'm looking at doing going forward. I mentioned enhancing stream distribution mechanisms above.
I may also try to publicly share some performance stats of Ruminate running a large network (~ 1 Gpbs) so that I demonstrate that Ruminate really does scale well. Most of the data I've published has involved Ruminate being used data sets much smaller than I would have liked.
I'm thinking of creating a Flash scanning service similar to the PDF service. Exploitation of SWF vulnerabilities is rampant. Like PDFs, some of the complications of SWFs (like file format compression and internal script language) are good for demonstrating the benefits of Ruminate.
The point of these object analyzers has primarily been for demonstrating the value of the framework and the associated mechanisms but in the future I hope to innovate in detection mechanisms also.
While my primary purpose in building Ruminate is to conduct research, I hope sharing my implementation will be helpful to some, notwithstanding the many imperfections.