Monday, March 22, 2010

Vortex Howto Series: Network Surviellance

While my last post was very high level, clearly in the realm of pontification, I’d like to come down to the other extreme and present a series of very technical howtos related to vortex, a utility for analysis of TCP stream data.

Throughout the series I’d like to demonstrate the use of vortex through the following examples:
  • How to use vortex to build a network surveillance tool
  • How to use vortex to build a near real-time deep content analysis IDS
  • How to use vortex as a network forensics tool
  • How to use vortex (and xpipes) to do the above in a highly scalable manner including leveraging highly parallel processors

In demonstrating these uses of vortex, my focus will be to explain though example the non-intuitive aspects of vortex and to show in rough proof-of-concept form what vortex can be used to do. I’ll also try to compare vortex to other tools to show where vortex adds value and where you’re better off using something else.

Before I begin, I’d like to refer the reader to another blog, securityfu, which has a nice introduction to vortex entitled Vortex IDS - Get Super Snagadocious on Ubuntu. Toosmooth provides an excellent overview of vortex. He also introduces some ideas of tools that could be built on top of vortex, especially deep email analysis, which seems to be something for which vortex is very well suited.

I’d also like to clarify my relationship to vortex. Vortex was written and shared with the community by Lockheed Martin. The Charles Smutz mentioned in the changelogs, etc is the same person as the Charles Smutz who authors this blog, with the exception being that former does so as a Lockheed Martin employee and the latter does so as an individual. To be explicit, this blog is in no way sponsored or endorsed by Lockheed Martin and expresses my personal views and opinions as a security researcher.

Ok, now on with the real material. Our goal in this segment of the vortex howto series will be to develop a mail relay (client) fingerprinting tool. This could just as easily be an FTP, HTTP, etc client fingerprinting tool.

The point will be to demonstrate how to use vortex to collect network payload data in a user friendly way. We will be collecting characterizations about network clients which will be useful for historical analysis. In many ways, this is very similar to Sourcefire RNA but instead of characterizing network servers, we’ll be focusing on the clients. Also, we will not be focusing on creating transaction logs, for which Bro IDS is often very effective, depending on the information you want to collect. We will be focusing on building up an archive of network client fingerprints.

The first thing we need to do is collect network streams so we can analyze them. In absence of a better data set, we’ll be using the DARPA Intrusion Detection Evaluation 2000 data set: http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/data/2000/NT_dataset/outside.tcpdump.gz. It’s not very interesting but it does have a fair amount of complete smtp connections so it is adequate for demonstrating what we want to do. It's also freely available so you can follow along on your own if you want.

The Basics

First of all, like tcpdump, to collect network data from a live capture device, you use the -i option. To replay dead packets, ex. pcap file, the -r option is used. In this example, we’ll be replaying pcaps but the same techniques demonstrated would work equally well on a live network analyzer.

Next, we need to specify where to put the data. The -t option does this. If you don’t specify anything, stream files end up in your current working directory. If you’re going to be spooling raw streams to disk for archival, specifying a directory on disk is fine. However, if you are going to be processing the streams and selectively writing small portions the data to disk/DB, spooling the streams to ramdisk of some sort is often the right thing to do. /dev/shm is the location of a tempfs mount common on modern linux distros which works perfect for this purpose.

Since we’re going to concern ourselves with SMTP for the moment, we’ll use a BPF of “tcp port 25”. There are some issues with doing this that we’ll address in one of the other articles in this series.

So to extract the streams we’re interested in we’d do the following:
$ mkdir streams
$ vortex -r outside.tcpdump -f "tcp port 25" -t streams
Couldn't set capture thread priority!
streams/196.37.75.158:1052s172.16.114.50:25
streams/196.37.75.158:1052c172.16.114.50:25
streams/196.37.75.158:1104s172.16.114.169:25
streams/196.37.75.158:1104c172.16.114.169:25
streams/196.37.75.158:1106s172.16.114.207:25
streams/196.37.75.158:1106c172.16.114.207:25

streams/197.218.177.69:22094s172.16.114.207:25
streams/197.218.177.69:22094c172.16.114.207:25
streams/195.115.218.108:30802s172.16.114.168:25
streams/195.115.218.108:30802c172.16.114.168:25
VORTEX_ERRORS TOTAL: 0 IP_SIZE: 0 IP_FRAG: 0 IP_HDR: 0 IP_SRCRT: 0 TCP_LIMIT: 0 TCP_HDR: 0 TCP_QUE: 0 TCP_FLAGS: 0 UDP_ALL: 0 SCAN_ALL: 0 VTX_RING: 0 OTHER: 0
VORTEX_STATS PCAP_RECV: 0 PCAP_DROP: 0 VTX_BYTES: 5455133 VTX_EST: 1719 VTX_WAIT: 0 VTX_CLOSE_TOT: 1719 VTX_CLOSE: 1718 VTX_LIMIT: 0 VTX_POLL: 0 VTX_TIMOUT: 0 VTX_IDLE: 0 VTX_RST: 0 VTX_EXIT: 1 VTX_BSF: 0

We’ve extracted all the relevant tcp streams from the pcap and stored them in files. At this point, we haven’t done anything that couldn’t be done just as easy with a myriad of other tools such as tcpflow, tcpick, etc so lets keep moving.

Since we’re interested in characterizing the smtp client (relay forwarding an email), let’s look at an example stream showing data transmitted from the client to the server:

$ head -n 15 streams/135.13.216.191:11896s172.16.113.204:25
EHLO alpha.apple.edu
HELO alpha.apple.edu
MAIL From: <ansgarz@alpha.apple.edu>
RCPT To: <jouniw@goose.eyrie.af.mil>
DATA
Received: (from mail@localhost) by alpha.apple.edu (SMI-8.6/SMI-SVR4)
id: CAA16711; Sat, 7 Aug 1999 14:19:07 -0400
Date: Sat, 7 Aug 1999 14:19:07 -0400
To: jouniw@goose.eyrie.af.mil
Subject: To Introduction exposes us an object
Message-Id: <19990807141907.caa16711>

To Introduction exposes us an object can cause
The type your own memory improved over The
normal density classifier parameters; and

Let’s say for the sake of this exercise, we’re interested in the HELO command (some clients start with HELO and some lead off with EHLO), the sender’s domain (usually each smtp relay forwards mail for a relatively small number of domains, often only one), and the relay server's hostname and software name as reported in the first received line.

Our Analyzer

The following shell script, when combined with vortex, would collect this info and store it in a file per IP address, each line containing a unique fingerprint:

#!/bin/bash

while read STREAM_FILE
do
CLIENT_IP=`basename $STREAM_FILE | awk -F: '{ print $1 }'`
HELO_CMD=`head -n 1 $STREAM_FILE | awk '{ print $1 }'`
SENDER_DOMAIN=`grep -i "^MAIL FROM:" $STREAM_FILE | \
sed -r 's/^.*@(.*)>.*$/\1/g'`
BY_STRING=`grep -E -o -h "by [0-9a-zA-z.-]+( \(.*\))?" \
$STREAM_FILE | head -n 1 | sed 's/by //g'`

FINGERPRINT="$HELO_CMD $SENDER_DOMAIN $BY_STRING"

if ! grep -F "$FINGERPRINT" "$CLIENT_IP" 2>/dev/null
then
echo "$FINGERPRINT" >> "$CLIENT_IP"
fi

rm $STREAM_FILE

done
download smtp_fingerprint.sh

Note that we’ve followed the basic paradigm of a vortex analyzer: read a filename from STDIN, analyze it, delete it.

Also note that this shell script is very quick and dirty. There are so many things wrong with it, we won’t even list them. However, it doesn’t take much vision to see what 30 - 50 lines of perl/python/ruby code could do, possibly with a DB.

Pulling it Together

We’ve got a couple other things left to work out. First, we are only interested in “s” streams--the streams going from the tcp client to the server. As such, we’re going to set the client collection size (-C) to zero. While collecting complete client to server streams is fine, in our case, we’re really only interested in the first few lines of the stream so we’re going to set the to server collection size (-S) to 4 Kb.

Since we’re snarfing, analyzing, then purging, we’re going to store the streams temporarily in ramdisk (/dev/shm).

Our finished product is as follows:
$ mkdir smtp_fingerprints
$ cd smtp_fingerprints
$ vortex -r ../outside.tcpdump -t /dev/shm -S 4096 -C 0 -f \
"tcp port 25" | smtp_fingerprint.sh
Couldn't set capture thread priority!
EHLO jupiter.cherry.org jupiter.cherry.org (SMI-8.6/SMI-SVR4)
EHLO jupiter.cherry.org jupiter.cherry.org (SMI-8.6/SMI-SVR4)
EHLO finch.eyrie.af.mil finch.eyrie.af.mil (SMI-8.6/SMI-SVR4)
EHLO mars.avocado.net mars.avocado.net (SMI-8.6/SMI-SVR4)

EHLO alpha.apple.edu alpha.apple.edu (SMI-8.6/SMI-SVR4)
VORTEX_ERRORS TOTAL: 0 IP_SIZE: 0 IP_FRAG: 0 IP_HDR: 0 IP_SRCRT: 0 TCP_LIMIT: 0 TCP_HDR: 0 TCP_QUE: 0 TCP_FLAGS: 0 UDP_ALL: 0 SCAN_ALL: 0 VTX_RING: 0 OTHER: 0
VORTEX_STATS PCAP_RECV: 0 PCAP_DROP: 0 VTX_BYTES: 3323358 VTX_EST: 1719 VTX_WAIT: 0 VTX_CLOSE_TOT: 1719 VTX_CLOSE: 1455 VTX_LIMIT: 263 VTX_POLL: 0 VTX_TIMOUT: 0 VTX_IDLE: 0 VTX_RST: 0 VTX_EXIT: 1 VTX_BSF: 0
Hint--VTX_LIMIT: Streams truncated due to size limits. If not desired,
adjust stream size limits accordingly (-C, -S).
EHLO alpha.apple.edu alpha.apple.edu (SMI-8.6/SMI-SVR4)

EHLO pluto.plum.net pluto.plum.net (SMI-8.6/SMI-SVR4)
EHLO epsilon.pear.com epsilon.pear.com (SMI-8.6/SMI-SVR4)
$
Alright, let’s look at the output of our masterpiece.
$ ls
135.13.216.191 172.16.112.194 172.16.112.50 172.16.113.84 172.16.114.169 194.27.251.21 195.73.151.50 197.182.91.233
135.8.60.182 172.16.112.20 172.16.113.105 172.16.114.148 172.16.114.207 194.7.248.153 196.227.33.189 197.218.177.69
172.16.112.149 172.16.112.207 172.16.113.204 172.16.114.168 172.16.114.50 195.115.218.108 196.37.75.158
Cool, a file per client IP as we expected.

Let’s take a peak at a few:
$ cat 135.13.216.191
EHLO alpha.apple.edu alpha.apple.edu (SMI-8.6/SMI-SVR4)
$ cat 172.16.112.194
EHLO falcon.eyrie.af.mil falcon.eyrie.af.mil (SMI-8.6/SMI-SVR4)
$ cat 172.16.112.20
EHLO zeno.eyrie.af.mil hobbes.eyrie.af.mil (8.8.7/8.8.7)
The digests, just as we wanted. We analyzed 1719 smtp streams. How many digests are there?
$ wc -l *
1 135.13.216.191
1 135.8.60.182
1 172.16.112.149

1 196.227.33.189
1 196.37.75.158
2 197.182.91.233
1 197.218.177.69
24 total
A relatively low number indicating a lot of duplicate fingerprints. Ok, most files have one line, which is to be expected. Let's look at the one with more than 1.
$ cat 197.182.91.233
EHLO marslistserv.com mars.avocado.net (SMI-8.6/SMI-SVR4)
EHLO mars.avocado.net mars.avocado.net (SMI-8.6/SMI-SVR4)
Perfect, email relays that relay mail for more than one domain have more than one digest.

What can we do with this monstrosity we’ve created? Let’s run a few queries on our awkward DB, CLI style:

What server relay software is used and with what frequency?
$ cat * | awk '{ print $NF }' | sort | uniq -c
1 (8.8.0/8.8.5)
1 (8.8.7/8.8.7)
22 (SMI-8.6/SMI-SVR4)
Clearly this is data miner’s paradise ;) A whole 3 different types of SMTP relay software are used.

Who has ever delivered mail for (or claimed to be) orange.com?
$ grep "orange.com" *
195.73.151.50:EHLO lambda.orange.com lambda.orange.com (SMI-8.6/SMI-SVR4)
I think you get the idea.

So the dataset we used is pretty limited and the analyzer we created is certainly contrived, but I hope this demonstrates the type of thing you could do using vortex as the basis for a network surveillance tool. You could collect and store or mine just about any data you wanted to. Because vortex is used, the analyst doesn’t have to worry about extracting data from packets. Network data appears as files, which is perfect for the CLI ninja. While not explicitly shown here, since vortex handles all the real time constraints, with a few minor modifications, our script could run on a production network of decent size and still perform fine.

I’ve explained the most basic usage of vortex and demonstrated its use for something that can’t as easily be done with any other tools that I know of. In future installments of this series we’ll demonstrate various other aspects of vortex and how it is used.

No comments:

Post a Comment