Tuesday, July 19, 2011

Cyberprefixation Benchmark

Continuing my crusade against senseless buzzword use which I began with a recent post, I've created a cyberprefixation benchmark tool. It ranks pages based on numerous variables.

Flaming examples that demonstrate all of the input variables and the upper end of the rating scale include two pages from DHS:


If this tools helps even one person then I'll feel like the (tiny) time spent on this was worth it :)

Saturday, June 25, 2011

Ruminate 06/22 Weekendly Build

It's been way too long since I've posted any public updates on Ruminate. I'd like to highlight two things from the 06/22 Weekendly Build. Enough has changed to warrant a "release", except that I haven't been able to do as much testing as I normally do.


Splitting Vortex and Ruminate Server


Previous versions of Ruminate have been based on essentially a fork of Vortex. Now Ruminate relies on Vortex (or some similar thing like tcpflow) to generate network streams and Ruminate takes it from there. This allows Ruminate to benefit immediately from any updates to Vortex and better fits the implementation paradigm I've chosen for Ruminate (loose composition of many small components). This also allows for a single instance of of the stream capture mechanism (instead of one per protocol).

The new architecture looks like:



Now you start the stream distribution mechanism on the capture server with something like:

vortex {options} | ruminate_server


Note that ruminate_server doesn't take the same options as the old one. I haven't yet decided how I want to specify some of the options (like which streams get classified as which protocol and which port those streams are distributed on) so these are set in the code. In the future, I hope to make this much more flexible, allowing for protocol selection to be based not only on port, but also on content. Right now, streams are processed by the first, and only the first, protocol parser whose filter is matched by the stream. In the future, I'd like to support more than one, probably by giving a copy to each parser that wants the stream.


Significant Fix to http_parser


Those who have used Ruminate extensively will know that it occasionally comes across a stream that just kills the performance of http_parser. It's not that big deal if one of many http_parsers churns for a long period of time if you have a lot of them, but it's clearly not ideal. From what I can tell, the major cause of this situation is inefficient code in http_parser in the case of an HTTP response that doesn't include a Content-Length header. I've put in a fix for this that provides orders of magnitude improvement in this case, especially if the response payload is large.


Going Forward


There are a few things I'm looking at doing going forward. I mentioned enhancing stream distribution mechanisms above.

I may also try to publicly share some performance stats of Ruminate running a large network (~ 1 Gpbs) so that I demonstrate that Ruminate really does scale well. Most of the data I've published has involved Ruminate being used data sets much smaller than I would have liked.

I'm thinking of creating a Flash scanning service similar to the PDF service. Exploitation of SWF vulnerabilities is rampant. Like PDFs, some of the complications of SWFs (like file format compression and internal script language) are good for demonstrating the benefits of Ruminate.

The point of these object analyzers has primarily been for demonstrating the value of the framework and the associated mechanisms but in the future I hope to innovate in detection mechanisms also.

While my primary purpose in building Ruminate is to conduct research, I hope sharing my implementation will be helpful to some, notwithstanding the many imperfections.

Saturday, June 18, 2011

For the Jargon File: Cyberprefixation

Cyberprefixation: n.

1. the compulsive, excessive, or vain use of the term “cyber” before other words forming words or word sequences not used in typical dialogue. In this context, “cyber” frequently indicates a narrow meaning such as information security, however, the precise meaning is almost always ambiguous. In cyberprefixation, the addition of the term “cyber” doesn’t necessarily provide any meaningful description or clarification, but rather is used predominately for its value as a buzzword. Cyberprefixation may be used to describe all cases where “cyber” precedes the word it modifies, whether separated by punctuation or where combined to form single word. Cyberprefixation often results in creation of nonce words.

Cyberprefixation is closely related to cyalliteration, but differs in that cyberprefixtion refers the use of "cyber" in it's entirety, whereas cyalliteration leverages only first syllable “cy”.

Example: That press release was a prime example of cyberprefixation.

See also: cybercrud and buzzword-compliant

Tuesday, May 31, 2011

Join: Relational Queries the CLI Way

In this post, I hope to share a little CLI-fu that I’ve learned that I haven’t seen used very frequently by my fellow practitioners. My hope is that I may be able to return the favor many others have extended to me by showing how to use a nifty CLI tool.

Occasionally, one comes across the need to perform relational queries on data that is stored in flat files. One legitimate tactic for doing so is to just load the data at hand in a relational database such as sqlite or mysql. There are many situations where this is less than desirable or just not practical. In such situations, I’ve seen people hack together bash/perl/whatever scripts, many of which are extremely inefficient, harder than they need to be, and/or just plain ugly. Using “join”, in conjunction with classic line based text processing utils, can provide very for elegant solutions in some of these situations. Never heard of join? Keep reading as I extol its virtues!

I learned to use join working with Ruminate IDS. Ruminate creates logs for each of the processing layers data traverses. In its current state, correlating these logs at various layers of the processing stack is left for an external log aggregation/correlation system. In exploring events in flat file logs, I use join to splice multiple layers of the processing stack together, similar to what you would do using a join in SQL.

The following is some data I will use for this example. In order to sanitize the smallest amount of data possible, I’ve only included the logs entries I will be using here. Feel free to intersperse (or imagine interspersing) additional dummy logs if you like (make sure to maintain ordering on the keys used for joining—see explanation far below). This data represents the processing associated with the same malicious pdf transfered over the network in two separate transactions:

$ cat clamav.log
tcp-1305036479-10.1.1.1:51770c114.143.209.62:80_http-0 Exploit.PDF-22632
tcp-1305128525-10.1.1.1:57460c89.114.97.13:80_http-0 Exploit.PDF-22632
$ cat object.log
tcp-1305036479-10.1.1.1:51770c114.143.209.62:80_http-0 2080 9e9dfd9534fe89518ba997deac07e90d PDF document, version 1.6
tcp-1305128525-10.1.1.1:57460c89.114.97.13:80_http-0 2080 9e9dfd9534fe89518ba997deac07e90d PDF document, version 1.6
$ cat http.log
tcp-1305036479-10.1.1.1:51770c114.143.209.62:80_http-0 GET haeied.net /1.pdf
tcp-1305128525-10.1.1.1:57460c89.114.97.13:80_http-0 GET haeied.net /1.pdf.

Note that these log files are presented in reverse order of the processing stack. HTTP processing extracts objects and creates network protocol logs. File metadata is extracted from those objects. The objects are then multiplexed to analyzers like clamav for analysis.

Let’s say I want to look at all the files transferred over the network that matched the clamav signature “Exploit.PDF-22632”. I use the classic grep:

$ grep -F "Exploit.PDF-22632" clamav.log
tcp-1305036479-10.1.1.1:51770c114.143.209.62:80_http-0 Exploit.PDF-22632
tcp-1305128525-10.1.1.1:57460c89.114.97.13:80_http-0 Exploit.PDF-22632

Unfortunately, the TCP quad and timestamp doesn’t provide us much useful context. Let’s join in the http.log data:

$ grep -F "Exploit.PDF-22632" clamav.log | join - http.log
tcp-1305036479-10.1.1.1:51770c114.143.209.62:80_http-0 Exploit.PDF-22632 GET haeied.net /1.pdf
tcp-1305128525-10.1.1.1:57460c89.114.97.13:80_http-0 Exploit.PDF-22632 GET haeied.net /1.pdf

Whoa, that was easy. Note that join assumed that we wanted to use the first column as the key for joining. While we’re at it, let’s join in the object.log data, only selecting the columns we are interested in:

$ grep -F "Exploit.PDF-22632" clamav.log | join - http.log | join - object.log | cut -d" " -f 2-6,8-
Exploit.PDF-22632 GET haeied.net /1.pdf 2080 PDF document, version 1.6
Exploit.PDF-22632 GET haeied.net /1.pdf 2080 PDF document, version 1.6

One big advantage of join is that it is easy to use in conjunction with other filter programs such as grep, sed, and zcat. You might use sed to convert tcp quads from IDS alerts and firewall logs into exactly the same format so you can join them on the tcp quad as the key. Join works very well on large files, including compressed files, decompressing them on the fly to efficiently get the data you want. The following is the same, with the difference of operating on compressed files:

$ gzip -c clamav.log > clamav.log.gz
$ gzip -c object.log > object.log.gz
$ gzip -c http.log > http.log.gz
$
$ zgrep -F "Exploit.PDF-22632" clamav.log | join - <(zcat http.log.gz) | join - <(zcat object.log.gz) | cut -d" " -f 2-6,8-
Exploit.PDF-22632 GET haeied.net /1.pdf 2080 PDF document, version 1.6
Exploit.PDF-22632 GET haeied.net /1.pdf 2080 PDF document, version 1.6


Again, very easy to get a nice little report using data spanning multiple files.

To continue demonstrating join, I’m going to refer to the data used in an SQL JOIN tutorial.

I used the data in CSV form as follows:

$ cat customers.csv
1,John,Smith,John.Smith@yahoo.com,2/4/1968,626 222-2222
2,Steven,Goldfish,goldfish@fishhere.net,4/4/1974,323 455-4545
3,Paula,Brown,pb@herowndomain.org,5/24/1978,416 323-3232
4,James,Smith,jim@supergig.co.uk,20/10/1980,416 323-8888

$ cat sales.csv
2,5/6/2004,100.22
1,5/7/2004,99.95
3,5/7/2004,122.95
3,5/13/2004,100.00
4,5/22/2004,555.55

First, let’s start by generating a report for the marketing folk showing when each person has placed orders:

$ cat sales.csv | join -t, - customers.csv | sort -t, -k 1 | awk -F, '{ print $2","$4","$5 }'
5/6/2004,Steven,Goldfish
5/13/2004,Paula,Brown
5/7/2004,Paula,Brown
5/22/2004,James,Smith

Wow, didn’t that feel like you were using a relational database, albeit in a CLI type of way? Note that we had to specify the delimiter (same syntax as sort). Also, we sorted the output on customerid to ensure orders by the same person are contiguous. The astute reader, however, will notice that the report isn’t complete. We missed one of the sales on 5/7/2004. Why? From the man page we get the following critical nugget:

Important: FILE1 and FILE2 must be sorted on the join fields.

In this case we were joining on customerid columns, which are not in the same order in the sales and customers table. As such, we failed to join the records that weren’t sorted the same in both files. While this could be seen as a limitation of join, it is also what makes it efficient and makes it work so well with other utilities—all join operations occur with a single sequential pass through each file. Remember that “real” databases have indexes to make this sort of thing more efficient than a single full table scan. No frets though, for occasional queries, using sort to put the join fields in the same order works quite well. Also note that for a lot of security data, where the data is sorted chronologically, this requirement is frequently met with no additional effort, as shown in the Ruminate logs above. In this case, we’ll sort sales to put customerid in the same order as the customers table:

$ cat sales.csv | sort -t, -k 1 -g | join -t, - customers.csv | awk -F, '{ print $2","$4","$5 }'
5/7/2004,John,Smith
5/6/2004,Steven,Goldfish
5/13/2004,Paula,Brown
5/7/2004,Paula,Brown
5/22/2004,James,Smith

Now the order from John Smith shows up correctly.

Let’s do another simple query for the marketing folk: Report of all the customers that have placed individual purchases of over $100—the high rollers:

$ cat sales.csv | awk -F, '{ if ($3 > 100) print $0}' | sort -t, -k 1 -g | join -t, customers.csv - | cut -d, -f2,3,8
Steven,Goldfish,100.22
Paula,Brown,122.95
James,Smith,555.55

Again, this is simple and straightforward (in an esoteric CLI type of way). If we were doing an SQL tutorial, we would have just introduced a WHERE clause. If I were going to translate this as literally as possible to SQL I would do so as follows:

CLIpseudo-SQL
cat sales.csvFROM sales
awk -F, '{ if ($3 > 100) print $0}'WHERE saleamount > 100
sort -t, -k 1 -gUSING INDEX customerid
join -t, customers.csvJOIN customers ON customerid
cut -d, -f2,3,8SELECT firstname, lastname, saleamount


With the full pseudo-SQL as follows:

SELECT firstname, lastname, saleamount FROM sales JOIN customers ON customerid USING INDEX customerid WHERE saleamount > 100

For the last example, I’ll do the gratuitously ugly example from the tutorial whose data we are using. Let’s calculate the total spent by each customer:

$ cat sales.csv | awk -F"," '{SUMS[$1]+=$3} END { for (x in SUMS) { print x","SUMS[x]} }' | sort -t, -k 1 -g | join -t, customers.csv - | cut -d, -f2,3,7
John,Smith,99.95
Steven,Goldfish,100.22
Paula,Brown,222.95
James,Smith,555.55

Alright, so this isn’t so pretty, but it works.

In summary, join makes it easy to splice together data from multiple flat files. It works well in the classic *nix CLI analysis paradigm, using sequential passes through files containing one record per line. Join is particularly useful for infrequent queries on large files, including compressed files. Join plays well with the other CLI utils such as sed, awk, cut, etc and can be used to perform relational queries like those done in a database. I hope this short primer has been useful in demonstrating the power of join.

Friday, April 22, 2011

Faith and Security

I’d like to share some thoughts on how faith applies to us seeking to provide “security”, especially those of us in operational environments who expend large portions of our time and efforts to achieve this goal. I personally think it quite appropriate to speak of faith and religion openly and that our public/professional lives can’t (and shouldn’t) be fully abstracted from what many expect to be our private devotions. That being said, I’m going to try to avoid both general evangelization and pushing specific sectarian dogmas. I hope my remarks resonate with those sincerely trying to live their faith. I also hope these comments provide some perspective to help those who don’t believe in God better understand those who do.

Peace is greater than Security


For people of faith, security is a profane goal, let alone frequently arrogantly vain. Peace is the heavenly good that should be sought after. Absolute security is not only undesirable, but is contrary to our earthly existence. I believe it necessary to live in a fallen condition, such as our current mortal existence, where we are free to grow through choosing between good and bad including facing nearly constant adversity. Removing all opposition to good would frustrate our eternal progression. Regardless of your belief in our raison d’etre, religious and ethical codes guide adherents in how they react to adversity and hostility, including the heavenly pursuit of peace. For example, Christians believe a greater measure of peace can be found through Christ than possible through worldly means. Peace may be had in the absence of comprehensive security, requires a great measure of discipline, and doesn’t come at the cost of sacrifices to freedom. In an ideal world, we’d all be seeking peace.

The world isn’t perfect. One responsibility we all have is to uphold freedom and provide an appropriate level of security. Ironically, one of the primary methods of pursuing security is through force and compulsion. Often people find extreme measures, such as warfare, the best option for achieving security. While there is some variance, most religions justify violence under certain conditions. Sadly, we find ourselves in a world of turmoil and warfare. Even though we often take a less effective path to security than we might otherwise hope for, our faith must be able to provide guidance to us during such pursuits.

Seek Divine Help


If we want to succeed in any endeavor, seeking divine help is always wise. Providing an appropriate level of security, one that ensures individual liberties, is an honorable pursuit in which God will assist us. One important aspect of our lives is doing honest and honorable work. Important experiences occur as we seek and receive God’s assistance in our labors. While it would be nice if we as a society devoted fewer resources to preventing bad things from happening and more to ensuring good things happened, I think most people performing work in “security” do honest and honorable work. As such, we should seek the help God has offered to those that follow him.
I like to separate God’s help into two major classes: direct help and inspiration. An example of direct help would be unexpected severe weather impeding an opponent’s advancement through the countryside. On the flip side, leaders might be inspired to advance, retreat, or do seemingly odd things, often in opposition to reason. Clearly, these two forms of assistance can go hand in hand. A classic scriptural example is the Exodus of Israel from Egypt.

Ask and Ye Shall Receive


The first thing one should do when seeking God’s help is to ask. God has promised us great blessings, if we but ask. Certainly God always knows what we need and want, but many blessing are contingent upon our sincere supplication to him. I can think of nothing more natural than praying for safety, protection, and assistance in defense. Sadly, I think this very important step is often overlooked.

Outside of scriptural accounts, when thinking about the importance of prayer in security, I often visualize Arnold Friberg’s painting of Washington’s prayer at Valley Forge. I acknowledge that the facts surrounding the story of the Isaac Potts’ and other accounts are often disputed. Regardless, based on what I know of Washington, I believe it to be plausible that he (and many others) offered numerous sincere prayers to bring about the miraculous shift in the war that eventually resulted in American independence. The first step to providence is asking.

Keep Yourself Worthy of Divine Help


I wholeheartedly agree with the maxim that God helps those who help themselves. God frequently extends mercy and assistance to those who have done all in their power. While we don’t always understand the judgments of God, he also frequently withholds assistance from those who have neglected to do what they can, especially those who do so knowingly. If we want the Lord’s assistance, certainly we should be doing the very best work we can do.

Just like vigilance in preparation, maintenance, and practice is required to ensure proper operation of implements of security (e.x. personal firearms, fighter jets, electronic surveillance systems, etc), the same vigilance is required to maintain channels of divine assistance. For example, regular prayer, scripture study, and meditation are essential to ensuring constant guidance through heavenly inspiration. Obedience to laws and commandments, such as Sabbath day observance, health codes, morality and chastity, fasting, etc bring with them promised blessings and power. Most of us know what we need to do; we just need to be vigilant in doing so. Consistently doing what is right, even if we don’t feel an acute need for help at the moment, is very much what faith is about. This sort of faithfulness invariably results in confidence and answers to prayers when the time of need does come. The parable of the ten virgins beautifully advocates diligence in preparation.

Have the Faith to Act


If God extends help, it’s important to act upon it. Sadly, people often don’t have adequate faith to be guided by the wisdom of God over the wisdom of man. Admittedly, it often requires great faith to do so, especially in a world that is largely ruled by agnostic (and often short sighted) reason. Faith and principle based decisions are frequently hard to justify, especially in the face of empiricism (well founded or not). On the other hand, when we are given strong assurances through faith, we shouldn’t be afraid to proceed with what human wisdom deems as silly courses of action. Some of the biggest disappointments of my career have occurred as I’ve ignored inspiration concerning my work. On the other hand, as we act in faith we become more confident in doing so in the future. Through experiences in small things our faith will go to the point where we can do great things.

The scriptures are replete with examples of those who have had faith to act and those who haven’t. Infamous examples of those who lacked faith at key moments include Saul (the Old Testament King) and Pontius Pilate. On the other hand, demonstrations of great faith include those by David, Gideon, and Elisha.

Give Credit Where It’s Due


One principle that the secular world understands well, at least at first blush, is giving credit where it’s due. The sad reality is that the world makes it very hard to give adequate credit to God. In some cases it’s appropriate to keep highly miraculous or personal miracles to yourself. However, when others attribute the positive outcomes of divine assistance to you, it’s important to try to set the record straight. I feel it’s appropriate to use words such as “blessing” and “providence” to convey my belief of divine intervention to those who want to understand without unduly imposing on those who don’t. We can, and always should, give thanks to God directly through personal prayer.

One of my favorite examples of this principle is that of the preservation of Little Italy from the Great Fire of Baltimore. In 1904 the core of Baltimore City burned to the ground. As the fire swept across the city, many in the neighborhood of little Italy met in the local church to pray for deliverance. In general, most people agree that the wind changing direction prevented the fire from crossing the Jones Falls river, saving Little Italy from the inferno. Some attribute this outcome to providence and some to chance. It’s clear however, what the people in little Italy believed at the time.

No Room for Pride and Hatred


If we want God’s help, both in the short and long term, we have to do things in God’s way. Faith engenders love and teaches to avoid the pitfalls of hate and pride. While there are numerous sources I could site, I couldn’t resist the universality of Yoda teaching about the consequences of fear and hate:

"Fear is the path to the dark side. Fear leads to anger. Anger leads to hate. Hate leads to suffering."

Lest you think this principle is merely fiction, I point out that Proverbs warns about fear of man and that Timothy was taught to that fear is not of God.

While pride is not explicitly mentioned here, I consider it to be implied with, or at least compatible in the above quote. Pride is the grease that makes the slide from prosperity to degeneracy smooth, both for individuals and societies. While we are often forced to take extreme actions against our adversaries, we should be careful to not hate our enemies. We should beware lest we cause our own downfall and estrangement from God though our pride.

But if not...


While our faith can guide us and bring about miracles in our efforts to secure freedom, what about the times that prayers seems unanswered or the miracle doesn’t occur? We must be patient and remember that the faithful have to face the same opposition common to all man. We must remember that the demonstration of our faith precedes the miracle. We may think we know the ideal solution to a problem or the right timing for the solution, but often God knows differently. Our faith must be able to sustain us, bringing us peace, even in times when our efforts to ensure security seem to fail, at least in the short term.

Conclusion


I’ve shared some principles, that if followed, I earnestly believe can help those of us seeking to provide security find a greater measure of success through divine assistance. I hope these words are encouraging to those who are seeking to live their faith. For those who don’t seek to live by faith, I hope this post helps you better understand those who do.

I wish you all a happy and peaceful Easter.

Saturday, March 26, 2011

Passive Network Monitoring of Strong Authentication

There’s been a fair amount of consternation and FUD concerning the effectiveness of “strong authentication” in defending against APT. For example, in their M-trends 2011 report, Mandiant has demonstrated how smart cards are being subverted. If that isn’t bad enough, RSA has recently revealed that they’ve been victim of attacks that they believe are attributed to APT and which resulted in attackers getting access to information that may weaken the effectiveness of SecureID.

Unfortunately, like most people blogging about these issues, I can’t provide any more authoritative information on the topic other than to say that based on my personal experience, targeting and subverting strong authentication mechanisms is a common practice for some targeted, persistent attackers. It’s hard to predict the impact of any of these weaknesses. Additionally, people who have found out the hard way usually aren’t particularly open about sharing their hard knocks.

Nevertheless, I’d like to advance the suitability of passive network monitoring as a method for helping to audit authentication, especially strong authentication mechanisms. While auditing is more properly conducted using logs provided by the devices that actually perform authentication (and authorization, access control, etc if you want to be pedantic), there are real operational and organization issues that may well make passive network monitoring one of the most effective means of gathering the information necessary to perform auditing of strong authentication.

The vast majority of password based authentication mechanisms bundle the username with the password and provide both to the server either in the clear or encrypted. It is possible to provide the username in the clear and the password encrypted which would improve monitoring capabilities at the possible expense of privacy. In general, this bundling of credentials is done because confidentiality is provided through mechanisms that operate at a different layer of the stack: ex. username and password sent through SSL tunnel.

On the other hand, many authentication mechanisms provide the username/user identifier in the clear. For these protocols, passive network monitoring provides the ability to collect information necessary to provide some amount of auditing of user activity. In this post I advance two quick and dirty examples of how this information could be collected. For and simplicity’s and brevity’s sake, I’ll focus solely on collecting usernames. I’ve chosen two protocols that are very frequently used in conjunction with the strong authentication mechanisms: RADIUS and SSL/TLS client certificate authentication.

RADIUS


RADIUS isn’t exactly as the most secure authentication protocol in the world. Since it has some serious weaknesses, it’s normally not used over hostile networks (like the internet). However, it is frequently used internally to organizations. In fact, it is very frequently used in conjunction with strong credentials such as RSA SecureID. One nice thing about RADIUS is that the username is passed in the clear in authentication requests. As such it’s pretty simple to build a monitoring tool to expose this data to auditing.

In my example of monitoring RADIUS, I’ll use this packet capture taken from the testing data sets for libtrace.

In my experience tcpdump is very useful for monitoring and parsing older and simpler protocols, especially ones that usually don’t span multiple packets, like DNS or RADIUS. The following is shows how tcpdump parses one RADIUS authentication request:



/usr/sbin/tcpdump -nn -r radius.pcap -s 0 -v "dst port 1812" -c 1
reading from file radius.pcap, link-type EN10MB (Ethernet)
18:42:58.228064 IP (tos 0x0, ttl 64, id 47223, offset 0, flags [DF], proto: UDP (17), length: 179) 10.1.12.20.1034 > 192.107.171.165.1812: RADIUS, length: 151
Access Request (1), id: 0x2e, Authenticator: 36ea5ffd15130961caafc039b5909d34
Username Attribute (1), length: 6, Value: test
NAS IP Address Attribute (4), length: 6, Value: 10.1.12.20
NAS Port Attribute (5), length: 6, Value: 0
Called Station Attribute (30), length: 31, Value: 00-02-6F-21-EC-52:CRCnet-test
Calling Station Attribute (31), length: 19, Value: 00-02-6F-21-EC-5F
Framed MTU Attribute (12), length: 6, Value: 1400
NAS Port Type Attribute (61), length: 6, Value: Wireless - IEEE 802.11
Connect Info Attribute (77), length: 22, Value: CONNECT 0Mbps 802.11
EAP Message Attribute (79), length: 11, Value: .
Message Authentication Attribute (80), length: 18, Value: ...eE.*.B.._..).


Note that we intentionally haven’t turned the verbosity up all the way. While there’s a lot of other good info in there, let say we only want to extract the UDP quad and the username and then send them to our SIMS so we can audit them. Assuming a configuration of syslog that sends logs somewhere to be audited appropriately, the following demonstrates how to do so:



tcpdump -nn -r radius.pcap -s 0 -v "dst port 1812" | awk '{ if ( $1 ~ "^[0-9][0-9]:" ) { print SRC" "DST" "USER; SRC=$18; DST=$20; USER="" }; if ( $0 ~ " Username Attribute" ) { USER=$NF } }' | logger -t radius_request


This example generates syslogs that appears as follows:



Mar 26 14:45:15 monitor radius_request: 10.1.12.20.1034 192.107.171.165.1812: test
Mar 26 14:45:15 monitor radius_request: 10.1.12.20.1034 192.107.171.165.1812: test
Mar 26 14:45:15 monitor radius_request: 10.1.12.20.1034 192.107.171.165.1812: test


I’ve done no significant validation to ensure that it’s complete, but this very well could be used on a large corporate network as is. Obviously, you’d need to replace the -r pcapfile with the appropriate -i interface.

SSL/TLS Client Certificate


Another opportunity for simple passive monitoring is SSL/TLS when a client certificate is used. It is very common for this mechanism to be used to authenticate users with either soft or hard (ie. smart card) certificates to web sites. This mechanism relies on PKI which involves the use of a public and private key. While the private key should never be transferred over the network, and in many cases they never leave smart cards, the public keys are openly shared. In the case of SSL/TLS client certificate based authentication the public key, along with other information such as the client user identification, is passed in the clear during authentication as the client certificate.

To have data for this example, I generated my own. I took the following steps based on the wireshark SSL wiki:



openssl req -new -x509 -out server.pem -nodes -keyout privkey.pem -subj /CN=localhost/O=pwned/C=US
openssl req -new -x509 -nodes -out client.pem -keyout client.key -subj /CN=Foobar/O=pwned/C=US

openssl s_server -ssl3 -cipher AES256-SHA -accept 4443 -www -CAfile client.pem -verify 1 -key privkey.pem

#start another shell
tcpdump -i lo -s 0 -w ssl_client.pcap "tcp port 4443"

#start another shell
(echo GET / HTTP/1.0; echo ; sleep 1) | openssl s_client -connect localhost:4443 -ssl3 -cert client.pem -key client.key

#kill tcpdump and server

#fix pcap by converting back to 443 and fixing checksums (offload problem)
tcprewrite --fixcsum --portmap=4443:443 --infile=ssl_client.pcap --outfile=ssl_client_443.pcap


You can download the resulting pcap here.

The client certificate appears as follows:



$ openssl x509 -in client.pem -noout -text
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
b0:cc:6b:94:b4:83:0f:78
Signature Algorithm: sha1WithRSAEncryption
Issuer: CN=Foobar, O=pwned, C=US
Validity
Not Before: Mar 26 13:13:12 2011 GMT
Not After : Apr 25 13:13:12 2011 GMT
Subject: CN=Foobar, O=pwned, C=US
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
RSA Public Key: (1024 bit)
Modulus (1024 bit):
00:e5:d6:78:cd:95:4e:89:0c:88:bd:78:98:26:86:
0b:f1:be:df:85:98:a2:93:c1:66:65:44:d2:aa:08:
69:2d:4c:a9:9d:50:08:79:1d:58:6e:6d:b4:2b:24:
ca:37:90:d6:91:9f:6d:73:5f:51:5a:10:af:f0:ce:
85:85:d6:e4:42:7b:ca:b0:af:0c:52:8b:60:1c:5b:
3f:54:10:cc:c4:35:18:a8:a6:a7:c8:ae:df:b7:ab:
a9:d9:20:cf:f7:5c:43:01:2e:12:cf:96:45:87:e7:
7e:87:f7:5e:8f:25:23:1b:ee:bd:0a:79:48:07:99:
ba:cc:68:16:53:43:56:e9:a1
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Subject Key Identifier:
BD:C2:84:BF:76:17:B7:15:BC:2F:8C:7E:A6:E6:18:B1:47:60:A3:B6
X509v3 Authority Key Identifier:
keyid:BD:C2:84:BF:76:17:B7:15:BC:2F:8C:7E:A6:E6:18:B1:47:60:A3:B6
DirName:/CN=Foobar/O=pwned/C=US
serial:B0:CC:6B:94:B4:83:0F:78

X509v3 Basic Constraints:
CA:TRUE
Signature Algorithm: sha1WithRSAEncryption
4c:28:ea:47:20:38:d5:17:dd:cf:aa:f8:13:3e:d0:5f:cf:05:
7d:c7:a1:c3:f4:3e:d7:db:56:f7:d4:d6:d6:c6:f4:5c:47:5b:
99:f6:9c:23:2d:dc:75:ab:51:8b:96:df:26:3b:9e:59:8f:2c:
08:d1:84:bf:4f:98:65:b4:0f:b7:32:9d:2f:eb:d9:a5:a6:69:
b6:75:ce:03:f4:ad:3b:f2:e6:3a:a1:ff:44:ea:8a:98:40:34:
cc:dd:e0:d8:35:0e:8b:97:20:30:e4:7b:07:52:98:63:11:32:
5e:6e:cb:c7:f1:10:67:1c:cd:e2:03:3a:99:98:8b:2f:f8:94:
03:6f


For auditing, we are interested in extracting the CN, which in this case is “Foobar”. As the client certificate is transferred over the network, the CN appears as follows:



000002e0 00 3f 0d 00 00 37 02 01 02 00 32 00 30 30 2e 31 |.?...7....2.00.1|
000002f0 0f 30 0d 06 03 55 04 03 13 06 46 6f 6f 62 61 72 |.0...U....Foobar|
00000300 31 0e 30 0c 06 03 55 04 0a 13 05 70 77 6e 65 64 |1.0...U....pwned|
00000310 31 0b 30 09 06 03 55 04 06 13 02 55 53 0e 00 00 |1.0...U....US...|


Immediately preceding the string “Foobar” is following sequence (in hex):



06 03 55 04 03 13 06


I’m not 100% sure what the "06 03" is for, but I believe this to be invariant in client certificates (if not, this example needs fixing). The "55 04 03" is indicative of the following data being a CN. This is an x509/ASN.1 thing where this sequence maps to the OID 2.5.4.3. The "13" can vary among a few common values (it specifies the data type) and the "06" indicates the length of the data (6 ASCII characters). Using this knowledge of SSL certificates we can create a tool to extract and log all CNs as follows:



$ mkdir /dev/shm/ssl_client_streams
$ cd /dev/shm/ssl_client_streams/
$ vortex -r ssl_client_443.pcap -S 0 -C 10240 -g "svr port 443" | xargs -t -I+ pcregrep -o -H "\x06\x03\x55\x04\x03..[A-Za-z0-9]{1,100}" + | sed -r "s/\x06\x03\x55\x04\x03../ /" | sed 's/c/ /' | logger -t client_cert



This generates logs as follows:



Mar 26 15:26:05 sr2s4 client_cert: 127.0.0.1:41143 127.0.0.1:443: localhost1
Mar 26 15:26:05 sr2s4 client_cert: 127.0.0.1:41143 127.0.0.1:443: localhost1
Mar 26 15:26:05 sr2s4 client_cert: 127.0.0.1:41143 127.0.0.1:443: Foobar1


If you are new to vortex, check out my vortex howto series. Basically we’re snarfing the first 10k of SSL streams transferred from the client to the server as files then analyzing them. Note that since we’re pulling all CNs out of all the certificates in the certificate chain provided by the client, we’re getting not only “Foobar” but “localhost” who is the CA in this case. Also note the trailing garbage we were too lazy to remove.

While this works, this is a little too dirty even for me. The biggest problem is that the streams which are snarfed by vortex are never purged. Second, we’re doing a lot of work in an inefficient manner on each SSL stream, even those that don’t include client certs.

Let’s refactor this slightly. First, we’re going to immediately weed out all stream we don’t want look at. In this example I’m looking for client certs in general, but you could easily change signature to be the CA for the certificates which you are interested in monitoring. Ex. “Pwned Org CA”:



$ vortex -e -r ssl_client_443.pcap -S 0 -C 10240 -g "svr port 443" | xargs pcregrep -L "\x06\x03\x55\x04\x03" | xargs rm


That will leave all the streams which we want to inspect in the current dir. If we do something like the following in an infinite loop or very frequent cron job, then we’ll do the logging and purging we need:



find -cmin +1 -type f | while read file
do
pcregrep -o -H "\x06\x03\x55\x04\x03..[A-Za-z0-9]{1,100}" $file | sed -r "s/\x06\x03\x55\x04\x03../ /" | sed 's/c/ /' | logger -t client_cert
rm $file
done


This implementation is also probably suitable for use on a large network or pretty close to it.

For these examples, it’s assumed that the logs are streamed to a log storage, aggregation, or correlation tool for real time auditing or for historical forensics. I would not be surprised if there were flaws in the examples as presented, so use at your own risk or perform the validation and tweaking necessary for your environment. These examples are intended to be merely that—to show the feasibility. While I’ve discussed two specific protocols/mechanisms there are others that lend themselves to passive network monitoring as well as many that don’t.

In this post I’ve shown how passive network monitoring could be used to help audit the use or misuse of strong authentication mechanisms. I’ve given quick and dirty examples which are probably suitable or are close to something that would be suitable for use on enterprise networks. Notwithstanding the weaknesses in my examples, I hope they provide ideas for what can be done to “trust, but verify” strong authentication mechanisms through data collection done on passive network sensors.

Saturday, March 19, 2011

Update on Ruminate

It’s been a couple weeks, but I wanted to say a little bit about the Feb 26 Release of Ruminate.

Who should be interested in Ruminate?


This release is close to level of refinement and capabilities necessary for use in an operational environment. Ruminate will be useful for people who are willing to spend extensive effort integrating their own network monitoring tools. I doubt very few people will want to use it in exactly how it is out of the box, but many of the components or even the whole framework (with custom detections running on top) may be useful to others. Ruminate as currently constituted is not for those who want a simple install process. Ruminate doesn’t do alerting or event correlation. It is up to the user to integrate Ruminate with an external log correlation and alerting system.

The Good


I think the Ruminate architecture is very promising. It makes some things that are very hard to do in conventional NIDS look very easy. The following diagram shows the layout of the Ruminate components:




If you are totally new to Ruminate, I still suggest reading the technical report.

The improved HTTP parser is pretty groovy. I’m really pleased with the attempts I’m making at things like HTTP 206 defrag. I think my HTTP log format, which includes compact single character flags inspired by network flow records (e.g. argus), is pretty cute.

Since I haven’t documented it anywhere else, let me do it here. The fields in the logs (with examples) are as follows:



Jan 12 01:47:39 node1 http[26350]: tcp-198786717-1294814857-1294814859-c-33510-10.101.84.70:10977c129.174.93.161:80_http-0 1.1 GET cs.gmu.edu /~tr-admin/papers/GMU-CS-TR-2010-20.pdf 0 32768 206 1292442029 application/pdf TG ALHEk http://cs.gmu.edu/~tr-admin/papers/GMU-CS-TR-2010-20.pdf - "zh-CN,zh;q=0.8" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.224 Safari/534.10" "Apache"

Transaction ID: tcp-198786717-1294814857-1294814859-c-33510-10.101.84.70:10977c129.174.93.161:80_http-0
Request Version: 1.1
Request Method: GET
Request Host: cs.gmu.edu
Request Resource: /~tr-admin/papers/GMU-CS-TR-2010-20.pdf
Request Payload Size: 0
Response Payload Size: 32768
Response Code: 206
Response Last-Modified (unix timestamp): 1292442029
Response Content-Type: application/pdf
Response Flags: TG
Request Flags: ALHEk
Request Referer: http://cs.gmu.edu/~tr-admin/papers/GMU-CS-TR-2010-20.pdf
Request X-Forwarded-For: -
Request Accept-Language (in quotes): "zh-CN,zh;q=0.8"
Request User-Agent (in quotes): "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.224 Safari/534.10"
Response Server (in quotes): "Apache"


The Request Flags are as follows:


C => existence of "Cookie" header
Z => existence of "Authorization" header
T => existence of "Date" header
F => existence of "From" header
A => existence of "Accept" header
L => existence of "Accept-Language" header
H => existence of "Accept-Charset" header
E => existence of "Accept-Encoding" header
k => "keep-alive" value in Connection
c => "close" value in Connection
o => other value in Connection
V => existence of "Via" header


The Response Flags are as follows:


C => existence of "Set-Cookie" header
t => existence of "Transfer-Encoding" header, presumably chunked
g => gzip content encoding
d => deflate content encoding
o => other content encoding
T => existence of "Date" header
L => existence of "Location" header
V => existence of "Via" header
G => existence of "ETag" header
P => existence of "X-Powered-By" header
i => starts with inline for Content-Disposition
a => starts with attach for Content-Disposition
f => starts with form-d for Content-Disposition
c => other Content-Disposition


While not standard in any way, this log format should be very useful for my research.


The Bad


Ruminate is rough. It’s nowhere near the level of refinement of the leading NIDS. This is not likely to change in the short term.
Ruminate is based on a really old version of vortex. There are lots of reasons this isn’t optimal but the biggest issue is performance on high speed networks. Soon I’ll release a new version that is either based on the latest version of vortex or one that is totally separate from, but dependent on, vortex.

Yara Everywhere


I’ve added yara to the basically every layer of Ruminate. This is useful for those in operational environments because many people are used to and have existing signatures written for yara. Since Ruminate is very object focused (not network focused), yara makes a lot of sense. While applying signatures to raw streams is not what Ruminate is about, it was easy to do and may even be useful for environments struggling with limitations in signature matching NIDS. Lastly, the use of yara, with its extensive meta-signature rule definitions, helps fill a gap in Ruminate which can’t reasonably be filled by an external event correlation engine.

Ruminate or Razorback (or both)


I’ve been asked, and it’s a good question, how Ruminate and Razorback compare. Before I express my candid opinions, I want to say that I’m very pleased with what the VRT guys are doing with Razorback. While there is some overlap in what I’m doing and what they’re doing (at least in high level goals), there’s more than enough room for multiple people innovating in the network payload object analysis space. If nothing else, the more people in the space, the more legitimate the problem of analyzing client payload objects (file) becomes. It seems unfathomable to me, but there are many who still question the value of using NIDS for detecting attacks against client applications (Adobe, IE) versus the traditional server exploits (IIS, WuFTP) or detection of today’s reconnaissance (google search) versus old school reconnaissance (port scan).

To date, Ruminate’s unique contributions are very much focused on scalable payload object collection, decoding, and reconstruction. Notable features include dynamic and highly scalable load balancing of network streams, full protocol decoding for HTTP and SMTP/MIME, and object fragment reassembly (ex. HTTP 206 defrag). If you want to comprehensively analyze payloads transferred through a large network, Ruminate is the best openly available tool for exposing the objects to analysis. The actual object analysis is pretty loose in Ruminate today, but is definitely simple and scalable. Ruminate’s biggest shortcoming is its rough implementation and relatively low level of refinement. This isn’t a problem for academia and other research, but it is a barrier to widespread adoption.

Razorback is largely tackling the other end of the problem—what to do once you get the objects off the network (or host or other source for that matter). Razorback has a robust and well defined framework for client object analysis. While definitely in early beta state, Razorback is a whole lot more refined and “cleaner” than Ruminate. Razorback has centrally controlled object distribution model which has obvious advantages and disadvantages over what Ruminate is doing. Razorbacks’ limitations in network payload object extraction are inherited largely from it’s reliance on the Snort 2.0 framework, which to be fair, was never designed for this sort of analysis.

While I’ve never actually done it, if there was a brave soul who wanted to combine the best of both Ruminate and Razorback, it would be possible to use Ruminate to extract objects off the network and use Razorback to analyze the objects. Using the parlance of both respectively, one could modify Ruminate’s object multiplexer (object_mux) to be a collector for Razorback. The point I'm trying to make is that the innovations found in Ruminate and Razorback may be more complimentary than competing.

Take what you want (or leave it)


I’m sharing what I’ve implemented in hopes that it helps advance academic research and the solutions used in industry. Please take Ruminate as a whole, some components, or simply the ideas or paradigm and run with them. I’m always interested in hearing feedback on Ruminate or the ideas it advances. I’m also open to working with others on research using or continued development of Ruminate.

Saturday, January 22, 2011

Shameless plug for Colleagues' DC3 Presentations

Is it shameful to engage in cronyism, if you disclose it up front? I hope not.

While I’m not going to be attending the DoD Cyber Crime Conference this year, I’d like to draw attention to some of my colleagues who will be. Since I’m not attending, I haven’t looked at who else is speaking.

Sam Wenck, who co-presented with me last year and works side by side with me daily, is presenting on Threat Intelligence Knowledge Management for Incident Response. In essence, he’ll be speaking on how to implement the technology necessary to support intelligence driven CND. If you are interesting in improving your organization’s ability to record, maintain, and leverage threat intelligence, you should attend.

Kieth Gould will be speaking to the title of “When did it happen? Are you sure about that?” I believe the original title of this preso was “How to score a date with your PC” (which Kieth routinely does). Frankly, I’m just not deep enough into host based forensics to fully appreciate the subject matter. Kieth has a reputation for his aptitude for and thorough attention to esoteric technical detail. This presentation might break the Geek Meter scale.

Having had previews of the content, I expect both these presentations to contain an abundance of pragmatic technical content and be free from annoying marketing rhetoric.

I also believe Mike Cloppert is going to be on a panel (not sure which one), but he doesn’t need any help drawing crowds.

Thursday, January 13, 2011

Gnawing on HTTP 206 Fragmented Payloads with Ruminate

I've been madly working on getting Ruminate to a point where I can recommend it to people in industry for use, hopefully by the end of January 2011. I've done a huge amount of work on HTTP decoding including a working implementation of HTTP 206 defragmentation which I consider a "killer feature" when dealing with payloads transferred through the network. I wanted to take a break from the documentation and code packaging that Ruminate so badly needs to discuss the importance of this mechanism, along with some examples. This discussion should also help clarify the areas where Ruminate is seeking to innovate.

HTTP 206 Partial Content



As NIDS begin to earnestly address true layer 7 decoding and embedded object analysis (ex. files transferred through network), they will run into complications like HTTP 206. I haven't heard much about HTTP 206 defrag so I assume this isn't on most people's radar.

What is HTTP 206? It's basically HTTP's method of fragmenting payload objects. 206 is the response code, just like 200 or 404. If you want to download just part of a file, you can ask the server to give you a specific set (or sets) of bytes and compliant servers will respond with only the data you asked for via a 206 response.

If you're not looking for malicious content in HTTP 206 transactions, you should be. Who really cares about HTTP 206 transactions if they represent a very small number of total HTTP transactions on a network? One oft overlooked detail is that HTTP 206 is actually used to transfer a significant amount (often up to 20%) of the most interesting payloads, such as PDF documents or PE executables. Even though HTTP 206 is often used naively by unwitting clients, it is used to transfer malicious content just as well as benign content, making life harder for your NIDS in the process.

Layer 7 and Embedded Object Defrag


One of Ruminate's goals is to address layer 7 and payload object analysis with the same level of vigor that current NIDS address layer 3 and layer 4. Part of this analysis necessarily involves layer 7 and payload object defrag/reassembly just like layer 3 and layer 4 defrag/reassembly have been big topics for the current generation NIDS. HTTP 206 is a perfect example of layer 7 fragmentation that is loosely analogous to ipfrag, etc. What is an example of client application object fragmentation? Imagine you have malicious javascript and you want to evade NIDS that are smart enough to decode basic javascript obfuscation like hex armoring. One option is to split your javascript across multiple files (which all get included at run time), possibly across multiple servers/domains.

The next release of Ruminate will include thousands of lines of new and improved HTTP parsing code, including a new 206defrag service. When individual HTTP parser node comes across a HTTP 206 response, it feeds the fragmented payload to the 206defrag service which does the defragmentation. When 206defrag service has all the pieces of the file, the reassembled payload is passed through the object multiplexer to the appropriate analysis service(s), ex. PDF.

I'm very pleased at the progress I've made to address HTTP 206. First of all, it actually works! In operation so far, I've been able to look at a lot of interesting payloads that I wouldn't have been able to otherwise.

I wanted to share some examples that demonstrate uses of HTTP 206 in the wild. The first example will be very straightforward and is the type of thing you’ll see most often. The other two examples demonstrate characteristics that are less common, but still happen in the real world. None of the examples were contrived or fabricated--they were taken from real network traffic that I had no direct influence on. I will however, use them to show what I believe to be useful functionality of Ruminate. I anonymized the client IP addresses, but other than that, the data is just as observed. Note that other than interesting examples of HTTP 206 in action, there is absolutely no malicious, sensitive, private or otherwise interesting data in the pcaps. The 206_examples.zip download includes the pcaps of the examples and the relevant logs from Ruminate. For those stout of heart enough to actually tinker Ruminate in its current state, I’ve also included the new HTTP code in the download also.

Example A


Example A is a canonical example of HTTP 206 fragmentation. Let’s start with the logs:

[csmutz@master 206_examples]$ cat http_a.log
Jan 12 01:47:39 node1 http[26350]: tcp-198786717-1294814857-1294814859-c-33510-10.101.84.70:10977c129.174.93.161:80_http-0 1.1 GET cs.gmu.edu /~tr-admin/papers/GMU-CS-TR-2010-20.pdf 0 32768 206 1292442029 application/pdf TG ALHEk http://cs.gmu.edu/~tr-admin/papers/GMU-CS-TR-2010-20.pdf - "zh-CN,zh;q=0.8" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.224 Safari/534.10" "Apache"
Jan 12 01:47:39 master 206defrag: input tcp-198786717-1294814857-1294814859-c-33510-10.101.84.70:10977c129.174.93.161:80_http-0 555523 0 32768 cs.gmu.edu /~tr-admin/papers/GMU-CS-TR-2010-20.pdf 10.101.84.70
Jan 12 01:48:17 node4 http[26947]: tcp-198787353-1294814861-1294814896-c-523548-10.101.84.70:10978c129.174.93.161:80_http-0 1.1 GET cs.gmu.edu /~tr-admin/papers/GMU-CS-TR-2010-20.pdf 0 522755 206 1292442029 application/pdf TG ALHEk http://cs.gmu.edu/~tr-admin/papers/GMU-CS-TR-2010-20.pdf - "zh-CN,zh;q=0.8" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.224 Safari/534.10" "Apache"
Jan 12 01:48:17 master 206defrag: input tcp-198787353-1294814861-1294814896-c-523548-10.101.84.70:10978c129.174.93.161:80_http-0 555523 32768 522755 cs.gmu.edu /~tr-admin/papers/GMU-CS-TR-2010-20.pdf 10.101.84.70
Jan 12 01:48:17 master 206defrag: output tcp-198786717-1294814857-1294814859-c-33510-10.101.84.70:10977c129.174.93.161:80_http-0_206defrag normal 2 555523 5a484ada9c816c0e8b6d2d3978e3f503 tcp-198786717-1294814857-1294814859-c-33510-10.101.84.70:10977c129.174.93.161:80_http-0,tcp-198787353-1294814861-1294814896-c-523548-10.101.84.70:10978c129.174.93.161:80_http-0
[csmutz@master 206_examples]$ cat object_a.log
Jan 12 01:48:17 master object_mux[11977]: tcp-198786717-1294814857-1294814859-c-33510-10.101.84.70:10977c129.174.93.161:80_http-0_206defrag 555523 5a484ada9c816c0e8b6d2d3978e3f503 pdf PDF document, version 1.4

Unfortunately I don’t have time to explain in full the log formats, etc. Hopefully I'll document that somewhere more accessible than the code soon :). The first log line demonstrates the 1st HTTP transaction where the client asks the server for the first 32k of the PDF and the server obliges.

Headers are as follows:

GET /~tr-admin/papers/GMU-CS-TR-2010-20.pdf HTTP/1.1
Host: cs.gmu.edu
Connection: keep-alive
Referer: http://cs.gmu.edu/~tr-admin/papers/GMU-CS-TR-2010-20.pdf
Accept: */*
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.224 Safari/534.10
Accept-Encoding: gzip,deflate,sdch
Accept-Language: zh-CN,zh;q=0.8
Accept-Charset: GBK,utf-8;q=0.7,*;q=0.3
Range: bytes=0-32767

HTTP/1.1 206 Partial Content
Date: Wed, 12 Jan 2011 06:47:37 GMT
Server: Apache
Last-Modified: Wed, 15 Dec 2010 19:40:29 GMT
ETag: "56010f-87a03-497781c080540"
Accept-Ranges: bytes
Content-Length: 32768
Content-Range: bytes 0-32767/555523
Connection: close
Content-Type: application/pdf

That’s all straightforward. The HTTP parser realizes that it doesn’t have a complete payload object so instead of passing it to the object multiplexer it sends it to the 206defrag service. The next log line shows the 206defrag service receiving this fragment. Since it doesn’t have the whole object yet, it holds on to it.

After sampling the first 32k, the client gets the rest of the PDF. Headers as follows:

GET /~tr-admin/papers/GMU-CS-TR-2010-20.pdf HTTP/1.1
Host: cs.gmu.edu
Connection: keep-alive
Referer: http://cs.gmu.edu/~tr-admin/papers/GMU-CS-TR-2010-20.pdf
Accept: */*
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.224 Safari/534.10
Accept-Encoding: gzip,deflate,sdch
Accept-Language: zh-CN,zh;q=0.8
Accept-Charset: GBK,utf-8;q=0.7,*;q=0.3
Range: bytes=32768-555522
If-Range: "56010f-87a03-497781c080540"

HTTP/1.1 206 Partial Content
Date: Wed, 12 Jan 2011 06:47:41 GMT
Server: Apache
Last-Modified: Wed, 15 Dec 2010 19:40:29 GMT
ETag: "56010f-87a03-497781c080540"
Accept-Ranges: bytes
Content-Length: 522755
Content-Range: bytes 32768-555522/555523
Connection: close
Content-Type: application/pdf

Again, this is very straightforward. The client gets the rest of the file. Note the “Etag” and “If-Range” headers. If clients and servers consistently used this convention it might make reassembly easier. Alas, it’s frequently not used. The server was nice enough to report a content type of “application/pdf” for both fragments, doesn’t use any other content-encoding or transfer-encoding, etc. If only all transactions were this simple!

After receiving the 2nd fragment on the 4th log line, the 206defrag service realizes it has the whole payload now. Line 5 shows the service sending this payload object off for analysis. In line 6 the object multiplexer decides to send this file on to the PDF analyzer. Not shown here, but the PDF analysis service deems this PDF well worth the time reading :)

This is a very simple and clean example of HTTP 206 fragmentation. Most uses of HTTP 206 are similar to this, even if not quite this simple. In very many cases, instead of being split across separate TCP streams, the fragments are sent serially in the same stream a la pipelined request/responses. This general scenario is very common for PDFs.

One point I’d like to make here is that if your NIDS doesn’t do HTTP 206 defrag, you loose the opportunity to analyze a significant portion of PDFs, at least any analysis that requires looking at the whole PDF at once.

Example B


Example B is interesting for a couple reasons. Again, let’s start with the logs:

[csmutz@master 206_examples]$ cat http_b.log
Jan 12 02:17:56 node4 http[27618]: tcp-198921731-1294816073-1294816075-i-936869-192.168.72.14:3254c65.54.95.206:80_http-0 1.1 GET au.download.windowsupdate.com /msdownload/update/software/uprl/2011/01/windows-kb890830-v3.15-delta_7d99803eaf3b6e8dfa3581348bc694089579d25a.exe 0 816896 206 1294342831 application/octet-stream TP AEk - - "" "Microsoft BITS/6.6" "Microsoft-IIS/7.5"
Jan 12 02:17:56 master 206defrag: input tcp-198921731-1294816073-1294816075-i-936869-192.168.72.14:3254c65.54.95.206:80_http-0 1022920 0 816896 au.download.windowsupdate.com /msdownload/update/software/uprl/2011/01/windows-kb890830-v3.15-delta_7d99803eaf3b6e8dfa3581348bc694089579d25a.exe 192.168.72.14
Jan 12 02:17:56 node4 http[27618]: tcp-198921731-1294816073-1294816075-i-936869-192.168.72.14:3254c65.54.95.206:80_http-1 1.1 GET au.download.windowsupdate.com /msdownload/update/software/uprl/2011/01/windows-kb890830-v3.15-delta_7d99803eaf3b6e8dfa3581348bc694089579d25a.exe 0 0 - - - - AEk - - "" "Microsoft BITS/6.6" ""
Jan 12 02:33:26 node1 http[26761]: tcp-199054360-1294817575-1294817576-r-206649-192.168.72.14:3257c65.54.95.14:80_http-0 1.1 GET au.download.windowsupdate.com /msdownload/update/software/uprl/2011/01/windows-kb890830-v3.15-delta_7d99803eaf3b6e8dfa3581348bc694089579d25a.exe 0 206024 206 1294342831 application/octet-stream TP AEk - - "" "Microsoft BITS/6.6" "Microsoft-IIS/7.5"
Jan 12 02:33:26 master 206defrag: input tcp-199054360-1294817575-1294817576-r-206649-192.168.72.14:3257c65.54.95.14:80_http-0 1022920 816896 206024 au.download.windowsupdate.com /msdownload/update/software/uprl/2011/01/windows-kb890830-v3.15-delta_7d99803eaf3b6e8dfa3581348bc694089579d25a.exe 192.168.72.14
Jan 12 02:33:26 master 206defrag: output tcp-198921731-1294816073-1294816075-i-936869-192.168.72.14:3254c65.54.95.206:80_http-0_206defrag normal 2 1022920 fc13fee1d44ef737a3133f1298b21d28 tcp-198921731-1294816073-1294816075-i-936869-192.168.72.14:3254c65.54.95.206:80_http-0,tcp-199054360-1294817575-1294817576-r-206649-192.168.72.14:3257c65.54.95.14:80_http-0
[csmutz@master 206_examples]$ cat object_b.log
Jan 12 02:33:26 master object_mux[3282]: tcp-198921731-1294816073-1294816075-i-936869-192.168.72.14:3254c65.54.95.206:80_http-0_206defrag 1022920 fc13fee1d44ef737a3133f1298b21d28 null PE32 executable for MS Windows (GUI) Intel 80386 32-bit

At first glance, this looks a lot like the last example. There are some subtle but notable differences. First of all, the first tcp stream contains two requests, not one. While the first transaction looks normal, the log for the second is incomplete. The size of the response payload is “-“, there is no response code either, and none of the response headers are set. What is happening here is that Ruminate can validate and parse the request but it can’t do so with the response, so it just gives the metadata for the request. What is going on here? To find out, we’ll have to go to the packets...

Looking at packet 956, we see the second pipelined request. Presumably everything is still normal at this point:

[csmutz@master 206_examples]$ tshark -nn -r 206_example_b.pcap | grep "^956 "
956 1.259759 192.168.72.14 -> 65.54.95.206 HTTP GET /msdownload/update/software/uprl/2011/01/windows-kb890830-v3.15-delta_7d99803eaf3b6e8dfa3581348bc694089579d25a.exe HTTP/1.1

If we go farther down the packet trace we get to the point that the client receives the header for the 2nd response in packet 1213:

[csmutz@master 206_examples]$ tshark -nn -r 206_example_b.pcap | grep -C 2 "^1213 "
1211 1.407243 192.168.72.14 -> 65.54.95.206 TCP [TCP Dup ACK 1101#52] 3254 > 80 [ACK] Seq=581 Ack=899890 Win=65535 Len=0 SLE=935155 SRE=965425
1212 1.407254 65.54.95.206 -> 192.168.72.14 TCP [TCP segment of a reassembled PDU]
1213 1.407255 65.54.95.206 -> 192.168.72.14 HTTP HTTP/1.1 206 Partial Content (application/octet-stream)
1214 1.407347 192.168.72.14 -> 65.54.95.206 TCP [TCP Dup ACK 1101#53] 3254 > 80 [ACK] Seq=581 Ack=899890 Win=65535 Len=0 SLE=935155 SRE=965425
1215 1.407465 192.168.72.14 -> 65.54.95.206 TCP [TCP Dup ACK 1101#54] 3254 > 80 [ACK] Seq=581 Ack=899890 Win=65535 Len=0 SLE=935155 SRE=965425

Already we see something amiss. The client is ACKing incessantly some data at a point that is a partway into the payload of the 2nd response. As it turns out, the client never ACKs any more data, even though the server tries to ram the whole response down the client’s buffer. It appears that the whole payload for the 2nd response is transferred over the wire, but the client never ACKs it. Ruminate handles this case by assuming the client threw away the unACKed data and doing essentially the same. Since the whole response can’t be reconstructed, Ruminate punts and provides no metadata about the response in the log and doesn't send the payload fragment to the 206defrag service, considering it invalid. Some could argue that it would be nice if Ruminate was a little more promiscuous in the TCP reassembly and HTTP parsing. While I could see the argument that it would be nice to provide some information about the response, the current behavior is relatively simple and safe. I suspect that some other NIDS and network forensics utilities would actually use all the unACKed data, opening the door to analyze the whole payload at this point. I can see the appeal of this approach. I’m not 100% sure I’ve analyzed this situation correctly, but I think Ruminate does the right thing in this case.

It seems apparent that the client discarded this unACKed data because several minutes later, it requests the second fragment over again, which it receives successfully. After the client receives this second fragment, Ruminate splices it together and the exe is sent off for analysis. The interesting part about this 2nd attempt for the 2nd fragment is that this time the client chose a different mirror to download from--it’s on the same subnet but is a different IP.

I chose this example because it points out a few things. First it demonstrates how the classic layer 4 defrag accuracy problem can influence the layer 7 defrag problem. Similarly, it alludes to the same problems applied to layer 7. What do you do if layer 7, ex. HTTP 206 fragments, overlap? Which version do you keep if it’s different? Can this be used for NIDS evasion like it was in the layer 4 case? These are the type of interesting questions I hope Ruminate aids in studying.

I believe this example also helps validate some of the architecture of Ruminate, from dynamic load balancing of streams to a service based approach. Since the two layer 7 fragments were sent from distinct client/server IP pairs, you have no guarantee that the conventional method of static header load balancing would send the layer 7 fragments to the same HTTP analysis node. If you are going to do this the conventional NIDS way, you are forced accept a high cost in synchronization between the two analyzer nodes because layer 7 defrag can involve large amounts of data spread through long periods of time. The service based approach not only factors in realities of today’s commodity IT infrastructure, but makes this problem look relatively simple.

Example C



Instead of leading off with the logs for this example, I need to explain one more wrinkle of HTTP 206. I didn’t learn about this until I was trying to implement 206defrag and was disappointed that to see that many of the PDFs I tried to download on my own machine weren’t being successfully reconstructed by Ruminate (my computer almost always does HTTP 206 when downloading PDFs). If the client requests more than one byte range in a single request, the server puts the various responses in a MIME blob that separates the byte ranges much like multiple attachments to an email, but from what I’ve seen, sans the base64 encoding. If I understand correctly, this is very similar to how some POSTs are encoded.

This is how it looks in practice:

GET /courses/ECE545/viewgraphs_F04/loCarb_VHDL_small.pdf HTTP/1.1
Host: teal.gmu.edu
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 ( .NET CLR 3.5.30729; .NET4.0C) Creative ZENcast v1.02.10
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
X-REMOVED: Range
X-Behavioral-Ad-Opt-Out: 1
X-Do-Not-Track: 1
Range: bytes=1-1,0-4095

HTTP/1.1 206 Partial Content Date: Mon, 10 Jan 2011 17:02:50 GMT
Server: Apache
Last-Modified: Sat, 20 Nov 2004 02:05:07 GMT
ETag: "25fb6-79bec-d67fac0"
Accept-Ranges: bytes
Content-Length: 4303
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: multipart/byteranges; boundary=49980f01bf1635062

--49980f01bf1635062
Content-type: application/pdf
Content-range: bytes 1-1/498668

P
--49980f01bf1635062
Content-type: application/pdf
Content-range: bytes 0-4095/498668

%PDF-1.4
...

In this case you see the client asking for and the server responding with the second byte of the PDF, then the first 4K of it.

For brevity’s sake, I’ll only display the 206defrag “output” log:

Jan 10 12:04:02 master 206defrag: output tcp-170962418-1294678989-1294679016-c-233988-10.45.179.94:19950c129.174.93.170:80_http-0-part-1_206defrag normal 70 498668 94046a5fb1c5802d0f1e6d704cf3e10e tcp-170962418-1294678989-1294679016-c-233988-10.45.179.94:19950c129.174.93.170:80_http-0-part-1,tcp-170962418-1294678989-1294679016-c-233988-10.45.179.94:19950c129.174.93.170:80_http-1-part-1,tcp-170962841-1294678990-1294679016-c-305932-10.45.179.94:19953c129.174.93.170:80_http-1-part-4,tcp-170962418-1294678989-1294679016-c-233988-10.45.179.94:19950c129.174.93.170:80_http-6-part-1,tcp-170962418-1294678989-1294679016-c-233988-10.45.179.94:19950c129.174.93.170:80_http-7-part-2,tcp-170962841-1294678990-1294679016-c-305932-10.45.179.94:19953c129.174.93.170:80_http-2-part-1,...

In case you’re curious, yes the “70” early in the log means that the payload was assembled from 70 fragments. Furthermore, the “normal” means that the fragments that were spliced together from contiguous segments without any portions of the fragments overlapping. Note that the duplication of byte 1 numerous times doesn’t affect this because it’s not necessary to use those fragments. In the future, I could be more granular with the logic and logging for special cases where fragments are duplicated, fragments overlap, etc. I have little knowledge of how specific HTTP clients handle situations like overlapping fragments.

One other thing of note is that these fragments are being transferred through two simultaneous TCP connections (client port 19950 and 19953) using multiple HTTP 1.1 transactions. One other thing that I think is interesting about this example is the seemingly sporadic order in which the fragments are requested:

The following shows the client TCP port, the HTTP transaction index in that TCP connection, the, the MIME part index, the fragment start index, and the fragment length.

[csmutz@master 206_examples]$ cat http_c.log | grep input | sed -r 's/tcp-.*:([0-9]+)c.*-([0-9]+-part-[0-9]+) /\1.\2 /' | awk '{ print $7" "$9" "$10 }'
19953.0-part-0 1 1
19950.0-part-0 1 1
19953.0-part-1 487541 4096
19950.0-part-1 0 4096
19953.1-part-0 1 1
19950.1-part-0 1 1
19950.1-part-1 4096 14319
19953.1-part-1 478933 1325
19953.1-part-2 477152 1781
19950.2-part-0 1 1
19953.1-part-3 480258 803
19953.1-part-4 18415 2540
19950.2-part-1 494520 4096
19953.1-part-5 481061 697
19950.3-part-0 1 1
19953.2-part-0 1 1
19953.2-part-1 32255 13312
19950.3-part-1 498616 52
19953.3-part-0 1 1
19950.4-part-0 1 1
19953.3-part-1 52049 5315
19953.3-part-2 483154 1646
19950.4-part-1 491637 2883
19953.3-part-3 57364 5529
19953.3-part-4 485870 46
...

I’m not sure I can discern any pattern to the manner in which the fragments are transferred, but it’s definitely not in order. While this looks like a bit of a shotgun (double barreled in this case) approach to getting this file, it’s not overly haphazard as the fragments line up nicely. I did quickly look at the byteranges themselves to see if they correlated to the internal structure of the PDF (objects/streams) but didn’t see anything too obvious in the couple I examined. I’m also not sure why the client wants to request the second byte so frequently. According to my reckoning, the payload was reconstructed from 70 fragments, using 22 HTTP transactions, through 2 unique TCP connections. While definitely the exception rather than the norm, this is an example where the buffer then analyze model of Ruminate has significant benefits over the stateful incremental analysis model of conventional packet based NIDS.

While examples of rare conditions, examples B and C demonstrate the type of issues I’ve built Ruminate to be able to study and address. As attacks continue to move up the stack, NIDS research needs to also.

Descending out of the clouds into the real world, example A isn’t as uncommon as many might suppose. I’m hoping that the upcoming release of Ruminate, with vastly improved HTTP parsing capabilities, will prove useful to some in operational environments. I feel it important to reiterate that Ruminate is a research oriented tool--it’s somewhere between experimental and proof of concept. The last thing I want is for Ruminate to be used in manner that someone is misled with a false sense of security. It should go without saying, but only those who are willing to accept any limitations (presumably without knowing all of them) or are willing to do adequate vetting themselves should rely on Ruminate in production environments. That being said, I’ve been pleasantly surprised with what I’ve been able to do with Ruminate so far.

In the next couple weeks I’m going to work on refining, packaging, and documenting Ruminate so it will easier for those who want to try to play with it. I hope to have this done around the end of the month.

Saturday, January 1, 2011

5 Saddest Conspiracy Theories of 2010

Is it not obligatory for bloggers to make some sort of list at New Year? Well here is mine. I’m posting what I call the saddest conspiracy theories of 2010. These are all events that are clouded by secrecy and/or controversy, implying some amount of foul play or reckless incompetency. While all are somehow related to security or technology, some are on the periphery of the topics normally discussed in this blog. I’ll only give sensational one-sided coverage for these conspiracy theories. While I won’t even try to argue the “truth” of any of these, what makes them sad is that the level of plausibility is much higher than zero.

1. Another US Gov Sponsored Backdoor


The FBI has been accused of trying to put backdoors into the IPSEC implementation of OpenBSD. It appears, at least to the founder and leader of OpenBSD, that the FBI did contract people to modify OpenBSD for the purpose of introducing bugs. However, it’s unclear if intended audience for these bugs was the whole world (unlikely), organizations with specific hardware, or just an internal experiment. I’d be receptive to the experiment explanation if it was it was done openly (like my dabbling in breaking forward secrecy through OS level random escrow) or to the experiment explanation if it never touched the internet. The commits to a public project are kind of scary. The jury is still out on this one. However, if this turns out anything like the alleged NSA backdoor in the Windows PRNG, we won’t hear much more conclusive on this. The sad part is the community isn’t wondering if the three letter agencies are trustworthy participants in the design and implementation of crypto. The answer is clear: No. The real question is how many more of these are lingering both in open and closed source software.

2. Security Theater Turns Peep Show


Yes, I had to include it. The security theater that is TSA screening at airports was bad enough in the past. It has provided basically no improvement in security, has amplified the effects of terrorism, and has been an unjustified encroachment on civil liberties. This year sees the widespread deployment of X-ray backscatter machines, also known as full body scanners. The public backlash is heating up. While there’s plenty of controversy, and probably not a lot of conspiracy, the current state of airport security is just plain sad. Let’s hope we can find a way to apply the same logic and tactics which are being used so effectively for “real world” security to the field of cyber security.

3. Big Brother Breathes New Life Into Wiretapping Laws


Up until a few years ago, most people thought wiretapping laws were in place to prevent people from being covertly spied on by others, especially police and spooks that are wont to do things like warrantless wiretapping.Those of us who questioned the purpose of these wiretapping laws (or the constitution for that matter) back in 2007-2009 time frame, now have some consolation. In 2010, it has become common practice for police to use local and state wiretapping laws to retaliate against people who try to hold them accountable though recording of police in public settings. With a little luck and even more creative interpretation of laws, even the federal wiretapping laws may be useful in the future.

4. Traditional Journalism: Too Big to Fail


While I don’t want to delve in to the whole Wikileaks affair, one thing I’ve seen coming out of it is a lot of criticism of Wikileaks. Most of the criticism from the media seems rooted more in desires at maintaining their traditional role in filtering, pushing, and disseminating news than ensuring important news is uncovered and the public is informed. For example, when Floyd Abrams discusses Why WikiLeaks Is Unlike the Pentagon Papers he focuses more on the narrow topic of why wikileaks is a threat to traditional journalism instead of more fundamental topics like freedom of press or government accountability. To me it seems that the very wiki model is being attacked, not because it’s inherently wrong, but because it continues to marginalize the role of established information channels. The writing is on the wall that traditional news “sources” are an endangered species so they’re in survival mode. It seems that they are often more worried about fighting turf wars and ingratiating themselves with The Man than serving their more fundamental role of public watchdog. It really doesn’t matter where you fall on the professional vs. crowdsource information flow argument, when media is more worried about getting and maintaining government support than fulfilling their core mission, we ought to be scared. Don’t worry though, the next iteration of wikileaks, openleaks, is going to put the traditional media folk back into the loop.


5. US-China Diplomacy vis-à-vis Intellectual Property


So of all the conspiracy theories, this is the 800 pound panda. While many are still waking up to it, the ever widening scope of cyber espionage being conducted by targeted, persistent attackers is alarming. Many open sources, including Google, attribute these attacks to actors in China—-with largely unsupported and varying claims about the level of the Chinese Government’s involvement. The US should be pursuing diplomatic solutions to this problem, the economic portion of which has been aptly seen “as a trade issue that we have not dealt with.” So Hillary Clinton says with big words that China should investigate and the American people will be updated as the “facts become clear”. What have we heard so far on cyber espionage front? Not much. That’s OK though because the US has been very active this year in other tough diplomatic discussions with China. For example, Attorny General Holder visited China late this year to discuss intellectual property rights. Apparently, China promised to crack down on illegal distribution of music, movies, and software.

What a big win. First of all, we wouldn’t want to go lax on software piracy enforcement, especially not in light of recent extensive abuse by oppressive regimes. The problem is so bad that Microsoft, one of the most draconian companies when it comes to software piracy and one of the most permissive when it comes to “local” law (like search result filtering), recently extended free licenses to the type of organizations where unequal software piracy enforcement is used as a pretext for oppressing dissidents. I can definitely see how the relatively extreme punishments imposed on the relatively few people actually caught pirating music and videos in the US would fit well with the Chinese model of law enforcement. Not only that, but this could help fill in some of the pretext for abuse taken away by liberal software licensing. Best yet, continued discussions like this could lay the ground work for expansion of intellectual property protection even other western countries refuse to get caught up in. For example, wouldn’t it be great if software patents, one of the US’s greatest forms of meta-innovation of late, were enforced with the same vigor and uniformity in China as they are in the US?

Whether you feel like getting out your tinfoil hat or your tissue to catch your tears, I hope these critical reflections on 2010 have been amusing, even comical. Let’s all hope for better in 2011.