Monday, March 22, 2010

Vortex Howto Series: Network Surviellance

While my last post was very high level, clearly in the realm of pontification, I’d like to come down to the other extreme and present a series of very technical howtos related to vortex, a utility for analysis of TCP stream data.

Throughout the series I’d like to demonstrate the use of vortex through the following examples:
  • How to use vortex to build a network surveillance tool
  • How to use vortex to build a near real-time deep content analysis IDS
  • How to use vortex as a network forensics tool
  • How to use vortex (and xpipes) to do the above in a highly scalable manner including leveraging highly parallel processors

In demonstrating these uses of vortex, my focus will be to explain though example the non-intuitive aspects of vortex and to show in rough proof-of-concept form what vortex can be used to do. I’ll also try to compare vortex to other tools to show where vortex adds value and where you’re better off using something else.

Before I begin, I’d like to refer the reader to another blog, securityfu, which has a nice introduction to vortex entitled Vortex IDS - Get Super Snagadocious on Ubuntu. Toosmooth provides an excellent overview of vortex. He also introduces some ideas of tools that could be built on top of vortex, especially deep email analysis, which seems to be something for which vortex is very well suited.

I’d also like to clarify my relationship to vortex. Vortex was written and shared with the community by Lockheed Martin. The Charles Smutz mentioned in the changelogs, etc is the same person as the Charles Smutz who authors this blog, with the exception being that former does so as a Lockheed Martin employee and the latter does so as an individual. To be explicit, this blog is in no way sponsored or endorsed by Lockheed Martin and expresses my personal views and opinions as a security researcher.

Ok, now on with the real material. Our goal in this segment of the vortex howto series will be to develop a mail relay (client) fingerprinting tool. This could just as easily be an FTP, HTTP, etc client fingerprinting tool.

The point will be to demonstrate how to use vortex to collect network payload data in a user friendly way. We will be collecting characterizations about network clients which will be useful for historical analysis. In many ways, this is very similar to Sourcefire RNA but instead of characterizing network servers, we’ll be focusing on the clients. Also, we will not be focusing on creating transaction logs, for which Bro IDS is often very effective, depending on the information you want to collect. We will be focusing on building up an archive of network client fingerprints.

The first thing we need to do is collect network streams so we can analyze them. In absence of a better data set, we’ll be using the DARPA Intrusion Detection Evaluation 2000 data set: http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/data/2000/NT_dataset/outside.tcpdump.gz. It’s not very interesting but it does have a fair amount of complete smtp connections so it is adequate for demonstrating what we want to do. It's also freely available so you can follow along on your own if you want.

The Basics

First of all, like tcpdump, to collect network data from a live capture device, you use the -i option. To replay dead packets, ex. pcap file, the -r option is used. In this example, we’ll be replaying pcaps but the same techniques demonstrated would work equally well on a live network analyzer.

Next, we need to specify where to put the data. The -t option does this. If you don’t specify anything, stream files end up in your current working directory. If you’re going to be spooling raw streams to disk for archival, specifying a directory on disk is fine. However, if you are going to be processing the streams and selectively writing small portions the data to disk/DB, spooling the streams to ramdisk of some sort is often the right thing to do. /dev/shm is the location of a tempfs mount common on modern linux distros which works perfect for this purpose.

Since we’re going to concern ourselves with SMTP for the moment, we’ll use a BPF of “tcp port 25”. There are some issues with doing this that we’ll address in one of the other articles in this series.

So to extract the streams we’re interested in we’d do the following:
$ mkdir streams
$ vortex -r outside.tcpdump -f "tcp port 25" -t streams
Couldn't set capture thread priority!
streams/196.37.75.158:1052s172.16.114.50:25
streams/196.37.75.158:1052c172.16.114.50:25
streams/196.37.75.158:1104s172.16.114.169:25
streams/196.37.75.158:1104c172.16.114.169:25
streams/196.37.75.158:1106s172.16.114.207:25
streams/196.37.75.158:1106c172.16.114.207:25

streams/197.218.177.69:22094s172.16.114.207:25
streams/197.218.177.69:22094c172.16.114.207:25
streams/195.115.218.108:30802s172.16.114.168:25
streams/195.115.218.108:30802c172.16.114.168:25
VORTEX_ERRORS TOTAL: 0 IP_SIZE: 0 IP_FRAG: 0 IP_HDR: 0 IP_SRCRT: 0 TCP_LIMIT: 0 TCP_HDR: 0 TCP_QUE: 0 TCP_FLAGS: 0 UDP_ALL: 0 SCAN_ALL: 0 VTX_RING: 0 OTHER: 0
VORTEX_STATS PCAP_RECV: 0 PCAP_DROP: 0 VTX_BYTES: 5455133 VTX_EST: 1719 VTX_WAIT: 0 VTX_CLOSE_TOT: 1719 VTX_CLOSE: 1718 VTX_LIMIT: 0 VTX_POLL: 0 VTX_TIMOUT: 0 VTX_IDLE: 0 VTX_RST: 0 VTX_EXIT: 1 VTX_BSF: 0

We’ve extracted all the relevant tcp streams from the pcap and stored them in files. At this point, we haven’t done anything that couldn’t be done just as easy with a myriad of other tools such as tcpflow, tcpick, etc so lets keep moving.

Since we’re interested in characterizing the smtp client (relay forwarding an email), let’s look at an example stream showing data transmitted from the client to the server:

$ head -n 15 streams/135.13.216.191:11896s172.16.113.204:25
EHLO alpha.apple.edu
HELO alpha.apple.edu
MAIL From: <ansgarz@alpha.apple.edu>
RCPT To: <jouniw@goose.eyrie.af.mil>
DATA
Received: (from mail@localhost) by alpha.apple.edu (SMI-8.6/SMI-SVR4)
id: CAA16711; Sat, 7 Aug 1999 14:19:07 -0400
Date: Sat, 7 Aug 1999 14:19:07 -0400
To: jouniw@goose.eyrie.af.mil
Subject: To Introduction exposes us an object
Message-Id: <19990807141907.caa16711>

To Introduction exposes us an object can cause
The type your own memory improved over The
normal density classifier parameters; and

Let’s say for the sake of this exercise, we’re interested in the HELO command (some clients start with HELO and some lead off with EHLO), the sender’s domain (usually each smtp relay forwards mail for a relatively small number of domains, often only one), and the relay server's hostname and software name as reported in the first received line.

Our Analyzer

The following shell script, when combined with vortex, would collect this info and store it in a file per IP address, each line containing a unique fingerprint:

#!/bin/bash

while read STREAM_FILE
do
CLIENT_IP=`basename $STREAM_FILE | awk -F: '{ print $1 }'`
HELO_CMD=`head -n 1 $STREAM_FILE | awk '{ print $1 }'`
SENDER_DOMAIN=`grep -i "^MAIL FROM:" $STREAM_FILE | \
sed -r 's/^.*@(.*)>.*$/\1/g'`
BY_STRING=`grep -E -o -h "by [0-9a-zA-z.-]+( \(.*\))?" \
$STREAM_FILE | head -n 1 | sed 's/by //g'`

FINGERPRINT="$HELO_CMD $SENDER_DOMAIN $BY_STRING"

if ! grep -F "$FINGERPRINT" "$CLIENT_IP" 2>/dev/null
then
echo "$FINGERPRINT" >> "$CLIENT_IP"
fi

rm $STREAM_FILE

done
download smtp_fingerprint.sh

Note that we’ve followed the basic paradigm of a vortex analyzer: read a filename from STDIN, analyze it, delete it.

Also note that this shell script is very quick and dirty. There are so many things wrong with it, we won’t even list them. However, it doesn’t take much vision to see what 30 - 50 lines of perl/python/ruby code could do, possibly with a DB.

Pulling it Together

We’ve got a couple other things left to work out. First, we are only interested in “s” streams--the streams going from the tcp client to the server. As such, we’re going to set the client collection size (-C) to zero. While collecting complete client to server streams is fine, in our case, we’re really only interested in the first few lines of the stream so we’re going to set the to server collection size (-S) to 4 Kb.

Since we’re snarfing, analyzing, then purging, we’re going to store the streams temporarily in ramdisk (/dev/shm).

Our finished product is as follows:
$ mkdir smtp_fingerprints
$ cd smtp_fingerprints
$ vortex -r ../outside.tcpdump -t /dev/shm -S 4096 -C 0 -f \
"tcp port 25" | smtp_fingerprint.sh
Couldn't set capture thread priority!
EHLO jupiter.cherry.org jupiter.cherry.org (SMI-8.6/SMI-SVR4)
EHLO jupiter.cherry.org jupiter.cherry.org (SMI-8.6/SMI-SVR4)
EHLO finch.eyrie.af.mil finch.eyrie.af.mil (SMI-8.6/SMI-SVR4)
EHLO mars.avocado.net mars.avocado.net (SMI-8.6/SMI-SVR4)

EHLO alpha.apple.edu alpha.apple.edu (SMI-8.6/SMI-SVR4)
VORTEX_ERRORS TOTAL: 0 IP_SIZE: 0 IP_FRAG: 0 IP_HDR: 0 IP_SRCRT: 0 TCP_LIMIT: 0 TCP_HDR: 0 TCP_QUE: 0 TCP_FLAGS: 0 UDP_ALL: 0 SCAN_ALL: 0 VTX_RING: 0 OTHER: 0
VORTEX_STATS PCAP_RECV: 0 PCAP_DROP: 0 VTX_BYTES: 3323358 VTX_EST: 1719 VTX_WAIT: 0 VTX_CLOSE_TOT: 1719 VTX_CLOSE: 1455 VTX_LIMIT: 263 VTX_POLL: 0 VTX_TIMOUT: 0 VTX_IDLE: 0 VTX_RST: 0 VTX_EXIT: 1 VTX_BSF: 0
Hint--VTX_LIMIT: Streams truncated due to size limits. If not desired,
adjust stream size limits accordingly (-C, -S).
EHLO alpha.apple.edu alpha.apple.edu (SMI-8.6/SMI-SVR4)

EHLO pluto.plum.net pluto.plum.net (SMI-8.6/SMI-SVR4)
EHLO epsilon.pear.com epsilon.pear.com (SMI-8.6/SMI-SVR4)
$
Alright, let’s look at the output of our masterpiece.
$ ls
135.13.216.191 172.16.112.194 172.16.112.50 172.16.113.84 172.16.114.169 194.27.251.21 195.73.151.50 197.182.91.233
135.8.60.182 172.16.112.20 172.16.113.105 172.16.114.148 172.16.114.207 194.7.248.153 196.227.33.189 197.218.177.69
172.16.112.149 172.16.112.207 172.16.113.204 172.16.114.168 172.16.114.50 195.115.218.108 196.37.75.158
Cool, a file per client IP as we expected.

Let’s take a peak at a few:
$ cat 135.13.216.191
EHLO alpha.apple.edu alpha.apple.edu (SMI-8.6/SMI-SVR4)
$ cat 172.16.112.194
EHLO falcon.eyrie.af.mil falcon.eyrie.af.mil (SMI-8.6/SMI-SVR4)
$ cat 172.16.112.20
EHLO zeno.eyrie.af.mil hobbes.eyrie.af.mil (8.8.7/8.8.7)
The digests, just as we wanted. We analyzed 1719 smtp streams. How many digests are there?
$ wc -l *
1 135.13.216.191
1 135.8.60.182
1 172.16.112.149

1 196.227.33.189
1 196.37.75.158
2 197.182.91.233
1 197.218.177.69
24 total
A relatively low number indicating a lot of duplicate fingerprints. Ok, most files have one line, which is to be expected. Let's look at the one with more than 1.
$ cat 197.182.91.233
EHLO marslistserv.com mars.avocado.net (SMI-8.6/SMI-SVR4)
EHLO mars.avocado.net mars.avocado.net (SMI-8.6/SMI-SVR4)
Perfect, email relays that relay mail for more than one domain have more than one digest.

What can we do with this monstrosity we’ve created? Let’s run a few queries on our awkward DB, CLI style:

What server relay software is used and with what frequency?
$ cat * | awk '{ print $NF }' | sort | uniq -c
1 (8.8.0/8.8.5)
1 (8.8.7/8.8.7)
22 (SMI-8.6/SMI-SVR4)
Clearly this is data miner’s paradise ;) A whole 3 different types of SMTP relay software are used.

Who has ever delivered mail for (or claimed to be) orange.com?
$ grep "orange.com" *
195.73.151.50:EHLO lambda.orange.com lambda.orange.com (SMI-8.6/SMI-SVR4)
I think you get the idea.

So the dataset we used is pretty limited and the analyzer we created is certainly contrived, but I hope this demonstrates the type of thing you could do using vortex as the basis for a network surveillance tool. You could collect and store or mine just about any data you wanted to. Because vortex is used, the analyst doesn’t have to worry about extracting data from packets. Network data appears as files, which is perfect for the CLI ninja. While not explicitly shown here, since vortex handles all the real time constraints, with a few minor modifications, our script could run on a production network of decent size and still perform fine.

I’ve explained the most basic usage of vortex and demonstrated its use for something that can’t as easily be done with any other tools that I know of. In future installments of this series we’ll demonstrate various other aspects of vortex and how it is used.

Wednesday, March 3, 2010

Developing Relevant Information Security Systems

In January, I presented at DC3 on Agile Development for Incident Response. I firmly believe that rapid engineering of information security systems is necessary to effectively combat sophisticated threats. I’ve also been struck lately by the lack of relevance of so much information security research and development.

One thing that I am adamant about, but has largely been ignored by the mainstream security community, is the need to face sophisticated and determined attackers with a threat focused response. A few others have already written extensively on this topic. The one reference I will make is to Mike Cloppert’s explanation of security intelligence, specifically his article on attacking the kill chain which takes a conventional military construct (kill chain) and applies it to information security.

Threat focused analysis is necessary, but is not sufficient. Unfortunately, current off-the-shelf security systems do not adequately support this approach. To effectively perform security intelligence, new security tools must be developed. Sadly, sophisticated attackers are not static targets. They change and evolve. What’s more, the enemies themselves change over time.

Working in the defense sector, I often try to contrast the cyber security world to the physical security world. I do this predominately for the purpose of finding ways to apply lessons of the past to present problems. The world has a long history of fighting wars and developing weapons systems. There must be some lessons to be learned from conventional weapons systems that can be applied to the realm of cyber security. As such, I’m going to use 4 conventional weapons systems to express allegorically some of my recent musing on effectively developing threat focused information security systems.

Too Much, Too Late


It wasn’t too long ago I visited the final manufacturing plant for the F22 Raptor. I have to admit, seeing the F22 in person makes the technological marvel it is that much sexier. However, while the F22 largely meets the expectations that the engineers set out to accomplish so many years ago and truly is far superior to any other fighter out there, the US decided we didn’t need it any more, especially at the ~$150 million per plane cost.

What went wrong? Latency. The threat landscape has changed significantly in the last 3 decades. If we had active enemies with technology that could only be adequately matched by the F22, then the F22 would be a bargain. However, since F22s aren’t particularly useful in wars like Iraq an Afghanistan, the cost is unjustifiable. To add insult to injury, it is conceivable that in a decade or two we could have a real need for the F22 that justifies the high price tag, but since the production lines and engineering will have long ceased, simply building more of them won’t be an easy option.

The information security equivalents of the F22 exist. They are technologically magnificent. They operate well for the missions they were designed for. Unfortunately, the cheese has moved. It’s hard to say if the technologies will be relevant in the future, but if they’re not relevant enough to justify further investment today, it will likely mean starting again from scratch.

Smart Bomb


Starkly contrasted to the F22, is a humble artillery round called the M982 Excalibur. This thing is everything the F22 isn’t--mundane, relatively cheap, and fabulously effective against today’s threats. It’s been very popular in Iraq and Afghanistan because its precision allows its use against insurgency close to non-targets or in complex terrain.

What makes the Excalibur great? Was it lack of technical challenges and problems during development? No. Radical new technologies? No.

The Excalibur is great because it is an ingenious marriage of technologies from other high tech devices (insanely expensive guided missiles) with a widely deployed, reliable, and economical infrastructure (howitzer artillery). While the Excalibur is relatively economical, the XM1156 promises to make similar capabilities really cheap.

We need more Excaliburs in the field of information assurance. We need to take our existing IT infrastructure and security tools and make the relatively minor tweaks necessary to keep pace with the changing threat landscape. Just like the howitzer munitions have changed over time to keep pace with enemies, often, we just need minor adjustments to our core IT infrastructure to allow us to respond to today’s attackers. However, if we can’t get the requisite features in a timely manner, we are often forced to make do without or employ a whole new tool just to fill a relatively small role. One general example I can think of is audit logs. All too often, the inclusion of one small piece of information is all that is required to turn a vanilla IT system into a widely deployed IDS.

Waiting for Godot


The Expeditionary Fighting Vehicle (EFV) is an amphibious landing craft being developed for the Marines. The EFV is recognized as one of the top acquisition priorities for the Marines but the program is floundering. I guess it doesn’t take much imagination to figure out how fundamental landing craft are to the mission of the Marines. The EFV was supposed to be in service over a decade ago, but reliability issues have kept that from happening. The current projected deployment date is far enough out that it might slip again or that the project might get canceled or changed drastically.

There are too many EFVs in the realm of information security. There are lots of reasons why this occurs so often, which I don’t want to discuss at the moment. Risking being called an existentialist, I declare that a system that isn’t deployable yet doesn’t exist. We’ve got to stop building and waiting on vaporware. I’ve been burnt too many times by waiting for systems that are perpetually just around the corner. I wish it weren't true, but I have my own fair share of culpability in this regard. I do believe that applying agile instead of waterfall development methods will help curtail perpetually late projects. Clearly professional integrity is also required.

Freedom as in Speech


Probably the least well known weapon system I will use as an example is Acoustic Rapid COTS Insertion (ARCI). In short, ARCI delivers rapid improvements to the sonar systems of the US submarine fleet through frequent deployments of both new software and hardware, building largely from off-the-self hardware, such as Intel and AMD processors, and commercial or open source software such as the Linux operating system. ARCI has demonstrated the value of shifting from completely custom and proprietary solutions to leveraging off-the-shelf platforms in order to focus R&D resources on the features unique to the mission of the system. ARCI delivers new capabilities to the fleet at a previously unknown rate and has become a shining example of the Navy’s quest to acquire open systems. While lacking in historical track record, the Littoral Combat Ship promises to take this open systems approach to a meta level, making a ship a platform for modular mission systems that can be developed and deployed rapidly to fulfill current missions. I see great promise in this open systems approach to weapons systems.

There are already many good examples of openness in the realm of information security systems, but we need more. To remain relevant in the face of changing threats, information security systems must provide flexibility at the architectural, platform, and component level. Re-inventing the wheel is a waste of time that we can’t afford. We need to build upon established technologies and focus new development on the capabilities specific to the threats we face. We have to build openness and flexibility into our information security systems. My personal experience with ARCI has changed the way I think about developing highly specialized systems.

Security Development call to Keyboards


I’ve intentionally masked my complaints about information security systems development with analogies to military weapons systems, so as to not have to name any specific information security tools. Whether you agree with my hasty analysis of these weapons systems or not, I hope that the characterizations I’ve tried to establish allow you to identify the allegorical class of information security systems. The information security community must do better at defending against sophisticated attacks. A portion of the need for improvement rests on the security tool development sector and the people who direct them.

As security system developers, we need to create open systems that are relevant to today’s threats. We need to build flexibility into our systems at the architectural, platform, and component level. We need to build tools that ease customization, extension, and integration with other tools. We need to rapidly respond to our users’ request for changes to functionality. We have to shed the blinders of entrenched methods and truly innovate. We have to stop peddling vaporware.

As people who buy or direct development of security tools, we require open systems that both meet our needs today, and provide us the freedom to react to changes in the future. We must be judicious in asking for highly specialized tools that aren’t possible to develop in a short time frame and which might be irrelevant before they are completed. We must find ways to motivate our vendors to provide what we need and not more. When our vendors can’t or won’t provide the capabilities we need, we have to roll up our sleeves and do it ourselves.