Throughout the series I’d like to demonstrate the use of vortex through the following examples:
- How to use vortex to build a network surveillance tool
- How to use vortex to build a near real-time deep content analysis IDS
- How to use vortex as a network forensics tool
- How to use vortex (and xpipes) to do the above in a highly scalable manner including leveraging highly parallel processors
In demonstrating these uses of vortex, my focus will be to explain though example the non-intuitive aspects of vortex and to show in rough proof-of-concept form what vortex can be used to do. I’ll also try to compare vortex to other tools to show where vortex adds value and where you’re better off using something else.
Before I begin, I’d like to refer the reader to another blog, securityfu, which has a nice introduction to vortex entitled Vortex IDS - Get Super Snagadocious on Ubuntu. Toosmooth provides an excellent overview of vortex. He also introduces some ideas of tools that could be built on top of vortex, especially deep email analysis, which seems to be something for which vortex is very well suited.
I’d also like to clarify my relationship to vortex. Vortex was written and shared with the community by Lockheed Martin. The Charles Smutz mentioned in the changelogs, etc is the same person as the Charles Smutz who authors this blog, with the exception being that former does so as a Lockheed Martin employee and the latter does so as an individual. To be explicit, this blog is in no way sponsored or endorsed by Lockheed Martin and expresses my personal views and opinions as a security researcher.
Ok, now on with the real material. Our goal in this segment of the vortex howto series will be to develop a mail relay (client) fingerprinting tool. This could just as easily be an FTP, HTTP, etc client fingerprinting tool.
The point will be to demonstrate how to use vortex to collect network payload data in a user friendly way. We will be collecting characterizations about network clients which will be useful for historical analysis. In many ways, this is very similar to Sourcefire RNA but instead of characterizing network servers, we’ll be focusing on the clients. Also, we will not be focusing on creating transaction logs, for which Bro IDS is often very effective, depending on the information you want to collect. We will be focusing on building up an archive of network client fingerprints.
The first thing we need to do is collect network streams so we can analyze them. In absence of a better data set, we’ll be using the DARPA Intrusion Detection Evaluation 2000 data set: http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/data/2000/NT_dataset/outside.tcpdump.gz. It’s not very interesting but it does have a fair amount of complete smtp connections so it is adequate for demonstrating what we want to do. It's also freely available so you can follow along on your own if you want.
The Basics
First of all, like tcpdump, to collect network data from a live capture device, you use the -i option. To replay dead packets, ex. pcap file, the -r option is used. In this example, we’ll be replaying pcaps but the same techniques demonstrated would work equally well on a live network analyzer.Next, we need to specify where to put the data. The -t option does this. If you don’t specify anything, stream files end up in your current working directory. If you’re going to be spooling raw streams to disk for archival, specifying a directory on disk is fine. However, if you are going to be processing the streams and selectively writing small portions the data to disk/DB, spooling the streams to ramdisk of some sort is often the right thing to do. /dev/shm is the location of a tempfs mount common on modern linux distros which works perfect for this purpose.
Since we’re going to concern ourselves with SMTP for the moment, we’ll use a BPF of “tcp port 25”. There are some issues with doing this that we’ll address in one of the other articles in this series.
So to extract the streams we’re interested in we’d do the following:
$ mkdir streamsWe’ve extracted all the relevant tcp streams from the pcap and stored them in files. At this point, we haven’t done anything that couldn’t be done just as easy with a myriad of other tools such as tcpflow, tcpick, etc so lets keep moving.
$ vortex -r outside.tcpdump -f "tcp port 25" -t streams
Couldn't set capture thread priority!
streams/196.37.75.158:1052s172.16.114.50:25
streams/196.37.75.158:1052c172.16.114.50:25
streams/196.37.75.158:1104s172.16.114.169:25
streams/196.37.75.158:1104c172.16.114.169:25
streams/196.37.75.158:1106s172.16.114.207:25
streams/196.37.75.158:1106c172.16.114.207:25
…
streams/197.218.177.69:22094s172.16.114.207:25
streams/197.218.177.69:22094c172.16.114.207:25
streams/195.115.218.108:30802s172.16.114.168:25
streams/195.115.218.108:30802c172.16.114.168:25
VORTEX_ERRORS TOTAL: 0 IP_SIZE: 0 IP_FRAG: 0 IP_HDR: 0 IP_SRCRT: 0 TCP_LIMIT: 0 TCP_HDR: 0 TCP_QUE: 0 TCP_FLAGS: 0 UDP_ALL: 0 SCAN_ALL: 0 VTX_RING: 0 OTHER: 0
VORTEX_STATS PCAP_RECV: 0 PCAP_DROP: 0 VTX_BYTES: 5455133 VTX_EST: 1719 VTX_WAIT: 0 VTX_CLOSE_TOT: 1719 VTX_CLOSE: 1718 VTX_LIMIT: 0 VTX_POLL: 0 VTX_TIMOUT: 0 VTX_IDLE: 0 VTX_RST: 0 VTX_EXIT: 1 VTX_BSF: 0
Since we’re interested in characterizing the smtp client (relay forwarding an email), let’s look at an example stream showing data transmitted from the client to the server:
Let’s say for the sake of this exercise, we’re interested in the HELO command (some clients start with HELO and some lead off with EHLO), the sender’s domain (usually each smtp relay forwards mail for a relatively small number of domains, often only one), and the relay server's hostname and software name as reported in the first received line.
$ head -n 15 streams/135.13.216.191:11896s172.16.113.204:25
EHLO alpha.apple.edu
HELO alpha.apple.edu
MAIL From: <ansgarz@alpha.apple.edu>
RCPT To: <jouniw@goose.eyrie.af.mil>
DATA
Received: (from mail@localhost) by alpha.apple.edu (SMI-8.6/SMI-SVR4)
id: CAA16711; Sat, 7 Aug 1999 14:19:07 -0400
Date: Sat, 7 Aug 1999 14:19:07 -0400
To: jouniw@goose.eyrie.af.mil
Subject: To Introduction exposes us an object
Message-Id: <19990807141907.caa16711>
To Introduction exposes us an object can cause
The type your own memory improved over The
normal density classifier parameters; and
Our Analyzer
The following shell script, when combined with vortex, would collect this info and store it in a file per IP address, each line containing a unique fingerprint:download smtp_fingerprint.sh
#!/bin/bash
while read STREAM_FILE
do
CLIENT_IP=`basename $STREAM_FILE | awk -F: '{ print $1 }'`
HELO_CMD=`head -n 1 $STREAM_FILE | awk '{ print $1 }'`
SENDER_DOMAIN=`grep -i "^MAIL FROM:" $STREAM_FILE | \
sed -r 's/^.*@(.*)>.*$/\1/g'`
BY_STRING=`grep -E -o -h "by [0-9a-zA-z.-]+( \(.*\))?" \
$STREAM_FILE | head -n 1 | sed 's/by //g'`
FINGERPRINT="$HELO_CMD $SENDER_DOMAIN $BY_STRING"
if ! grep -F "$FINGERPRINT" "$CLIENT_IP" 2>/dev/null
then
echo "$FINGERPRINT" >> "$CLIENT_IP"
fi
rm $STREAM_FILE
done
Note that we’ve followed the basic paradigm of a vortex analyzer: read a filename from STDIN, analyze it, delete it.
Also note that this shell script is very quick and dirty. There are so many things wrong with it, we won’t even list them. However, it doesn’t take much vision to see what 30 - 50 lines of perl/python/ruby code could do, possibly with a DB.
Pulling it Together
We’ve got a couple other things left to work out. First, we are only interested in “s” streams--the streams going from the tcp client to the server. As such, we’re going to set the client collection size (-C) to zero. While collecting complete client to server streams is fine, in our case, we’re really only interested in the first few lines of the stream so we’re going to set the to server collection size (-S) to 4 Kb.Since we’re snarfing, analyzing, then purging, we’re going to store the streams temporarily in ramdisk (/dev/shm).
Our finished product is as follows:
$ mkdir smtp_fingerprintsAlright, let’s look at the output of our masterpiece.
$ cd smtp_fingerprints
$ vortex -r ../outside.tcpdump -t /dev/shm -S 4096 -C 0 -f \
"tcp port 25" | smtp_fingerprint.sh
Couldn't set capture thread priority!
EHLO jupiter.cherry.org jupiter.cherry.org (SMI-8.6/SMI-SVR4)
EHLO jupiter.cherry.org jupiter.cherry.org (SMI-8.6/SMI-SVR4)
EHLO finch.eyrie.af.mil finch.eyrie.af.mil (SMI-8.6/SMI-SVR4)
EHLO mars.avocado.net mars.avocado.net (SMI-8.6/SMI-SVR4)
…
EHLO alpha.apple.edu alpha.apple.edu (SMI-8.6/SMI-SVR4)
VORTEX_ERRORS TOTAL: 0 IP_SIZE: 0 IP_FRAG: 0 IP_HDR: 0 IP_SRCRT: 0 TCP_LIMIT: 0 TCP_HDR: 0 TCP_QUE: 0 TCP_FLAGS: 0 UDP_ALL: 0 SCAN_ALL: 0 VTX_RING: 0 OTHER: 0
VORTEX_STATS PCAP_RECV: 0 PCAP_DROP: 0 VTX_BYTES: 3323358 VTX_EST: 1719 VTX_WAIT: 0 VTX_CLOSE_TOT: 1719 VTX_CLOSE: 1455 VTX_LIMIT: 263 VTX_POLL: 0 VTX_TIMOUT: 0 VTX_IDLE: 0 VTX_RST: 0 VTX_EXIT: 1 VTX_BSF: 0
Hint--VTX_LIMIT: Streams truncated due to size limits. If not desired,
adjust stream size limits accordingly (-C, -S).
EHLO alpha.apple.edu alpha.apple.edu (SMI-8.6/SMI-SVR4)
…
EHLO pluto.plum.net pluto.plum.net (SMI-8.6/SMI-SVR4)
EHLO epsilon.pear.com epsilon.pear.com (SMI-8.6/SMI-SVR4)
$
$ lsCool, a file per client IP as we expected.
135.13.216.191 172.16.112.194 172.16.112.50 172.16.113.84 172.16.114.169 194.27.251.21 195.73.151.50 197.182.91.233
135.8.60.182 172.16.112.20 172.16.113.105 172.16.114.148 172.16.114.207 194.7.248.153 196.227.33.189 197.218.177.69
172.16.112.149 172.16.112.207 172.16.113.204 172.16.114.168 172.16.114.50 195.115.218.108 196.37.75.158
Let’s take a peak at a few:
$ cat 135.13.216.191The digests, just as we wanted. We analyzed 1719 smtp streams. How many digests are there?
EHLO alpha.apple.edu alpha.apple.edu (SMI-8.6/SMI-SVR4)
$ cat 172.16.112.194
EHLO falcon.eyrie.af.mil falcon.eyrie.af.mil (SMI-8.6/SMI-SVR4)
$ cat 172.16.112.20
EHLO zeno.eyrie.af.mil hobbes.eyrie.af.mil (8.8.7/8.8.7)
$ wc -l *A relatively low number indicating a lot of duplicate fingerprints. Ok, most files have one line, which is to be expected. Let's look at the one with more than 1.
1 135.13.216.191
1 135.8.60.182
1 172.16.112.149
…
1 196.227.33.189
1 196.37.75.158
2 197.182.91.233
1 197.218.177.69
24 total
$ cat 197.182.91.233Perfect, email relays that relay mail for more than one domain have more than one digest.
EHLO marslistserv.com mars.avocado.net (SMI-8.6/SMI-SVR4)
EHLO mars.avocado.net mars.avocado.net (SMI-8.6/SMI-SVR4)
What can we do with this monstrosity we’ve created? Let’s run a few queries on our awkward DB, CLI style:
What server relay software is used and with what frequency?
$ cat * | awk '{ print $NF }' | sort | uniq -cClearly this is data miner’s paradise ;) A whole 3 different types of SMTP relay software are used.
1 (8.8.0/8.8.5)
1 (8.8.7/8.8.7)
22 (SMI-8.6/SMI-SVR4)
Who has ever delivered mail for (or claimed to be) orange.com?
$ grep "orange.com" *I think you get the idea.
195.73.151.50:EHLO lambda.orange.com lambda.orange.com (SMI-8.6/SMI-SVR4)
So the dataset we used is pretty limited and the analyzer we created is certainly contrived, but I hope this demonstrates the type of thing you could do using vortex as the basis for a network surveillance tool. You could collect and store or mine just about any data you wanted to. Because vortex is used, the analyst doesn’t have to worry about extracting data from packets. Network data appears as files, which is perfect for the CLI ninja. While not explicitly shown here, since vortex handles all the real time constraints, with a few minor modifications, our script could run on a production network of decent size and still perform fine.
I’ve explained the most basic usage of vortex and demonstrated its use for something that can’t as easily be done with any other tools that I know of. In future installments of this series we’ll demonstrate various other aspects of vortex and how it is used.