Saturday, May 1, 2010

Vortex Howto Series: Parallel NRT IDS

To fulfill all the major tasks I promised when I began this series of vortex howto articles, this installment will focus on scaling up the network analysis done with vortex in a way that leverages the highly parallel nature of modern servers. While the techniques shared in this post are applicable to all the uses of vortex demonstrated so far, it’s especially applicable to near-real time network analysis, a major goal of which is to support detections not possible with conventional IDS architectures, including high latency and/or highly computationally expensive analysis. If you are new to NRT IDS and its goals, I recommend reading about snort-nrt especially this blog post which explains why some very useful detection just can’t be done in traditional IDS architectures. As we’re going to build upon the work done in installment 3, I highly recommend reading it if you haven’t.

Many of us learned about multiprocessing and its advantages in college. In cases where you have high latency analysis, which often is caused by IO such as querying a DB, multiprocessing allows you to efficiently keep your processor(s) busy while accomplishing many high latency tasks in parallel. Traditionally, if you want to do computationally expensive tasks that can’t be done on a single processor, you have two options: use a faster processor or use multiple processors in parallel. Well, if you haven’t noticed, processor speeds haven’t increased for quite some time, but the number of processors in computers has increased fairly steadily. Therefore, as you scale up computationally expensive work on commodity hardware, your only serious choice is to parallelize. While the hard real time constraints of IPS make high latency analysis impossible and computationally expensive analysis difficult, if you are satisfied with near real-time, it’s a lot easier to efficiently leverage parallel processing.

Note that throughout this article, I’m not going to make a clear distinction between multi-threading, multi-processing, and multi-system processing. While text books make a stark differentiation, modern hardware and software somewhat blur the differences. For the purposes of this article, the distinction isn’t really important anyway.

Vortex is a platform for network analysis, but it doesn’t care if the analyzer you use is single or multi-threaded. Vortex works well either way. However, xpipes, which is distributed with vortex does make it easy to turn a single threaded analyzer into a highly parallel analyzer even if, or especially in the cases where, the analyzer is written in a language that doesn’t support threading.

Xpipes borrows much of its philosophy (and name) from xargs. Like xargs it reads a list of data items (very often filenames) from STDIN and is usually used in conjunction with a pipe, taking input from another program. While xargs takes inputs and plops them in as arguments to another program, xpipes takes inputs and divides them between multiple pipes feeding other programs. If you are in a situation where xargs works for you, then by all means, use it. Xpipes was written to be able to fit right between vortex and a vortex analyzer without modifying either, thereby maintaining the vortex interface. Xpipes spawns multiple independent instances of the analyzer program and divides traffic between the analyzers, feeding each stream to the next available analyzer. In general, xpipes is pretty efficient.

Slightly simplifying our ssdeep-n network NRT IDS from our last installment we get:


vortex -r ctf_dc17.pcap -e -t /dev/shm/ssdeep-n \
-K 600 | ./ssdeep-n.sh | logger -t ssdeep-n

To convert this to a multhreaded NRT IDS, we would do the following:

vortex -r ctf_dc17.pcap -e -t /dev/shm/ssdeep-n \
-K 600 | xpipes -P 12 -c './ssdeep-n.sh | logger -t ssdeep-n'

Now instead of a single instance of the analyzer we will have 12. Our system has 16 processors so this doesn’t fully load the system, but now a larger fraction of the total computing resources are used. Taking a look at this in top is as follows:

top - 12:56:25 up 102 days, 19:35, 4 users, load average: 17.30, 16.94, 9.
Tasks: 295 total, 7 running, 288 sleeping, 0 stopped, 0 zombie
Cpu(s): 16.5%us, 54.7%sy, 0.1%ni, 28.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%
Mem: 74175036k total, 73891608k used, 283428k free, 338324k buffers
Swap: 76218360k total, 155056k used, 76063304k free, 72417572k cached

PID VIRT RES S %CPU %MEM COMMAND
10345 66128 3492 R 22.6 0.0 ssdeep-n.sh
10346 66128 3452 S 22.3 0.0 ssdeep-n.sh
10322 66128 3456 R 21.9 0.0 ssdeep-n.sh
10336 66120 3440 R 21.9 0.0 ssdeep-n.sh
10337 66124 3464 R 21.9 0.0 ssdeep-n.sh
10343 66128 3488 R 21.9 0.0 ssdeep-n.sh
10330 66128 3464 R 21.6 0.0 ssdeep-n.sh
10342 66128 3444 S 21.6 0.0 ssdeep-n.sh
10351 66120 3476 S 21.6 0.0 ssdeep-n.sh
10326 66132 3476 S 20.9 0.0 ssdeep-n.sh
10340 66124 3448 S 20.9 0.0 ssdeep-n.sh
10329 66120 3452 S 19.9 0.0 ssdeep-n.sh
10302 350m 297m S 11.3 0.4 vortex
5 0 0 S 0.3 0.0 migration/1
32 0 0 S 0.3 0.0 migration/10

Beautiful, isn’t it?

If run to completion, the multithreaded version finishes in minutes while the single threaded version took hours.

As should be clear from the above, the -P option specifies the number of children processes to spawn. Typical values of this range from 2 to a few less than the number of processors in the system for highly computationally expensive analyzers. For high latency analyzers you can use quite a few more but there is an arbitrary limit of 1000.

One of the coolest features of xpipes is that it provides a unique identifier for each child process in the form of an environment variable. For each child process it spawns, xpipes sets the environment variable XPIPES_INDEX to an incrementing integer starting at zero. Furthermore, since the command specified is interpreted by shell, XPIPES_INDEX can be used in the command. Imagine that instead of using logger to write a log, we want to write directly to file. If you try something like:

$ vortex | xpipes -P 8 -c "analyzer > log.txt"

You would find that log file gets clobbered by multiple instances trying to write to the file at the same time. However, you could do the following:

$ vortex | xpipes -P 8 -c "analyzer > log_$XPIPES_INDEX.txt"

You’d end up with 8 log files, log_0.txt through log_7.txt which you could cat together if wanted. Similarly, if you want to lock each analyzer to a separate core, say 2-10, you could do something like the following:

$ vortex | xpipes -P 8 -c "taskset -c $[ $XPIPES_INDEX + 2 ] analyzer"

I think you get the idea. Just having a predictable identifier available to both the interpreter shell and the program opens a lot of doors.

Note that if you want to specify the command on the command line you can do so with the -c option. This can admittedly get a little tricky at times because of multiple layers of quoting etc. Alternatively, xpipes can read the command to execute from a file. For example:

$ echo 'analzyer "crazy quoted options"' > analyzer.cmd
$ vortex | xpipes -P 8 –f analyzer.cmd

That’s the basics of parallel processing for NRT IDS the vortex way. So while vortex takes care of all the real time constraints and heavy lifting of network stream reassembly, xpipes takes care of multithreading so all your analyzer has to do is analysis. While vortex’s primary goal has never been absolute performance, I have seen vortex used to perform both computationally expensive and relatively high latency analysis that would break a conventional IDS.

This largely fulfills the obligation I took on when I started this series of vortex howto articles. I hope this has been helpful to the community. I hope that someone who has read the series would be able to use vortex without too much trouble if a situation ever arose where it was the right tool for the job.

If there are other topics you would like discussed/explained, feel free to suggest a topic. For example, I’ve considered an article on tuning linux and vortex for lossless packet capture, but I think the README and error messages cover this pretty well. I’ve also considered discussing the details of the performance relevant parameters in vortex and xargs, but most of these work very well for most situations without any changes.

Again, I hope this series has will help people derive some benefit from vortex. I also want to reiterate my acknowledgments to Lockheed Martin for sharing vortex with the community as open source.

No comments:

Post a Comment