Friday, December 17, 2010

Announcing Ruminate IDS

I’m pleased to announce that Ruminate IDS, a system I’m building in order to conduct my PhD research, has been released as open source.

The goal of Ruminate is demonstrate the feasibility and value of flexible and scalable analysis of objects transferred through the network. Ex. PDFs, SWFs, ZIPs, DOCs, XLSs, GIFs, etc. To the best of my knowledge, there is no other IDS out there that focuses heavily on or provides comprehensive facilities to do this today. Ruminate doesn’t do the stuff that contemporary NIDS do well, such as signature matching, individual packet analysis, port scan detection, etc. If you’re interested in learning about Ruminate, reading the technical report is the best place to start.

The current implementation that is available for download is built largely to gather statistics useful for academic research. I’m hoping a release a version early in 2011 that will be more appropriate for people seeking to use it in operational environments. Regardless, I was somewhat surprised by the ability of Ruminate IDS as presently constituted to detect live attacks by highly targeted and sophisticated actors when used on a production campus network.

Ruminate is a great example of the type of IDS that could be built on top of the utility provided by vortex. It would probably be fair to consider a Ruminate a fabulous example (and facilitator) of Taco Bell Programming with both the good and bad connotations.

Despite the many imperfections and limitations, I hope Ruminate IDS may be of value to both academia and network defenders alike.

Saturday, December 11, 2010

Machine Learning Disabilities in Incident Detection

Intro


I can’t count how many times I’ve seen machine learning supposedly applied to solve a problem in the realm of information security. In my estimation, the vast majority of these attempts are a waste of resources that never demonstrate any real-world value. It saddens me to consistently see lots of effort and brainpower wasted on a field that I believe has a lot of potential. I’d like to share my thoughts on how machine learning can be effectively applied to incident detection. My focus is to address this topic in a manner and forum that is accessible by people in industry, especially those who fund, lead, or execute cyber security R&D. I hope some people in academia might find it useful, assuming they can stomach the information as presented here (including lack of academic formality and original empirical evidence). For what it’s worth, I consider myself having a pretty good amount of real world experience in targeted attack detection and a fair amount of academic experience in machine learning.

Definitions


Before I get too far, a few definitions are in order. Specifically, I need to clarify what I mean by “Machine Learning”. As used here, “Machine Learning” indicates the use of computer algorithms that provide categorization capabilities beyond simple signatures or thresholds and which implement generalized or fuzzy matching capabilities. Typically, the machine is trained with some examples of data of interest (usually attack and benign data) from which it learns through construction of a model that can be used to classify a larger corpus of observations (usually as attack or benign) even when the larger corpus contains observations that don’t exactly match the observations in the training data.
With “Incident Detection”, I’m trying to be a little more broad than the classic definition of Intrusion Detection or NIDS by adding in connotations relative to Incident Response. I almost used CND, but that isn’t quite right because CND is a very broad topic. “Using Machine Learing for CNA Detection” would be an accurate alternate title. While I’ll be using NIDS heavily in my examples, note that for me NIDS isn’t merely about detecting malicious activity on the network, it’s also about detecting and providing forensics capabilities to analyze otherwise benign attack activity performed by targeted, persistent attackers (or in other words supporting cyber kill chain analysis).

References


During this short essay, I’ll reference two academic papers. The first is the PhD thesis of my friend, mentor, and former boss: Rohan Amin. His thesis, Detecting Targeted Malicious Email through Supervised Classification of Persistent Threat and Recipient Oriented Features, is the best examples of the useful application of machine learning to the problem of incident detection I’ve ever seen. I’ve conversed with Rohan on his research from start to finish and have largely been waiting to write this essay until he finished his thesis so I would have a positive example to talk about. His research is refreshing: from the choosing one of the most pressing security problems of the APT age to making brilliant technical contributions. If rated against the recommendations I will make herein, Rohan’s paper scores very high.

My second reference is Outside the Closed World: On Using Machine Learning For Network Intrusion Detection which was presented at IEEE S+P 2010. Robin Sommer and Vern Paxson are academic researchers with some serious credentials in the field of NIDS. They are probably best known in industry for their contributions to Bro IDS. Their paper is geared to academics but tries to encourage some amount of real world relevancy in research. It makes me laugh with cynicism sometimes at the political correctness and positive tone with which they make recommendations to researchers such as “Understand what the system is doing.” While I don’t agree with everything Sommer and Paxson say, they say a lot that is spot on, the paper is well written, it provides a good view into how academics think, and it even explicitly, albeit briefly, calls out the difference in approach required for opportunistic and targeted attacks.

Solve a Problem (worth solving)


Sommer and Paxson said it so well:

The intrusion detection community does not benefit any further from yet another study measuring the performance of some previously untried combination of a machine learning scheme with a particular feature set, applied to something like the DARPA dataset.

Amen. The Engineer in me and my personality scoffs at what I see as a too haphazard and inefficient process of invention which involves combining one of the set of machine learning techniques with one of the set of possible problems, often apparently pseudo-randomly, until a good fit is found through empirical evaluation. Sure, there are numerous examples of where this general approach has worked in the past. Ex. Goodyear’s invention of sulfur vulcanization for rubber is often thought to have happened by luck. Certainly this methodology is at least compatible with Edison’s maxim of “Genius was 1 percent inspiration and 99 percent perspiration.” While systematically testing every permutation of machine learning algorithms, problems, and other options such as data sets and features selections, is perfectly valid, I don’t like it. Most people investing in research probably shouldn’t either. One of the problems I see with this in the real world is that many people have what they think is a whiz bang machine learning algorithm, possibly even working well in a different domain. Since cyber security is a hot topic, people try to port the whiz bang mechanism to the probleme du jour, e.g. cyber security. Often these efforts fail not because there isn’t some way in which the whiz bang mechanism could provide value in the cyber security realm, but because the whiz bang mechanisms isn’t applied to a specific enough or relevant enough problem, poor data is used for evaluation, etc.

One strong predictor of the relevancy of the research being conducted and the technology that will come from it is the relevancy of the data being evaluated. Could it be any more clear that if you are using data that is too old to reflect current conditions, you can have little confidence that your resulting technology will address today’s threats? Furthermore, if you are using synthetic data, you may be able to show empirically that your solution solves a possible problem under certain conditions, but you have no guarantee that the problem is a problem worth solving or that the conditions assumed will ever be reached in the real world. Sommer and Paxson largely trash any research that relies predominately on the DARPA 1998-2000 Intrusion Detection Evaluation data sets, with which I passionately agree.

While the relevancy of the data being evaluated is a pretty good litmus test for the relevancy of the technology coming from the research, I believe it’s much more fundamental than that. Below I present two models for R&D. In the S-P-D process, novelty is ensured by taking a solution and using increasing innovation and discovery to find a problem and then a data set/features set for which the solution can be empirically shown to be valid. This correlates to the all too frequently played out example I alluded to above where a whiz bang machine learning algorithm is applied to a new domain such as cyber security. The researcher spends most of his time figuring out how to apply the solution to a problem including finding or creating data that shows how the solution solves a problem. Clearly, there is little guarantee for real world relevancy, but academic novelty is assured throughout the process. On the other hand, in the D-P-S process, relevancy in ensured because the data is drawn from real world observation. By evaluating data real world events, a problem is discovered, described, and prioritized. Resources are dedicated to research, and a useful solution is sought. Academic novelty is not necessarily guaranteed, but relevancy is systemic. Rohan’s PhD research exemplifies the D-P-S problem. Between 2003 and 2006 Targeted Malicious Email (TME) evolved as the principle attack vector for highly targeted sophisticated attacks. As the problem of APT attacks became more severe and more was learned about the attacks, TME detection was identified as a critical capability. Analysis of the data (real attacks) revealed consistent patterns between attacks that current security systems could not effectively detect. Rohan recognized the potential of machine learning to improve detection capabilities and did the hard work of refining and demonstrating his ideas.



While I’m normally not a fan of these sort of models and diagrams, I want to make this point clear to the people funding cyber R&D. If you want to improve the ROI of your cyber R&D, make sure you are funding D-P-S projects, not S-P-D research. What does that mean for non-business types? The most important thing cyber security researcher need today is Data demonstrating real Problems. In the current climate, there is an over abundance of money being poured in cyber R&D. I agree with the vast majority of the recommendations given by Sommer and Paxson regarding data, including the recommendation that NIDS researchers secure access to a large production network. Researchers also understand the threat environment of that network. I will add that if individual organizations, industries, and governments want to advance current cyber security R&D, the most important thing they can do is provide researchers access to the data demonstrating the biggest problems they are facing, including required context. For more coverage on the topic of sharing attack information with researchers, see my post on how Keeping Targeted Attacks Secret Kills R&D.

On Problem Selection


In my very first blog post, I discussed Developing Relevant Information Security Systems. Some of the ideas presented there apply to the discussion at hand.

Machine Learning as applied to intrusion detection is often considered synonymous with anomaly detection. Even Sommer and Paxson equate the two. Maybe this springs from the classic taxonomy of NIDS that branches at signature matching and anomaly detection. Personally, I question the value of this taxonomy. Certainly NIDS like Bro somewhat break this taxonomy, requiring it to be expanded to at least misuse detection or anomaly detection. Even that division isn’t fully comprehensive. Detecting activity from persistent malicious actors, even if that activity isn’t malicious per se, is an important task of NIDS also, but doesn’t fall cleanly under traditional definitions of either misuse detection or anomaly detection.

Regardless of how you classify your NIDS, I don’t agree with equating machine learning and anomaly detection. Machine learning can be applied to misuse detection can’t it? While Rohan’s PhD work isn’t fully integrated with any public NIDS, it very well could be. Similarly, anomaly detection systems as discussed in academia often use machine learning to create models for detection, but it’s equally possible for anomaly detection systems to use human expert created thresholds or models.

The biggest problem I have with equating machine learning with anomaly detection is that anomaly detection is largely a nebulous and silly problem. Equating the two trivializes machine learning. It’s pretty easy to identify statistically significant outliers in data sets. The problem is that the designation as anomalous is often rather arbitrary, with most researchers doing little to demonstrate the real world relevancy of any anomalous detections. Furthermore, for all but the most draconian of environments, anomaly detection is silly anyway. Anyone with any operational experience knows that the mind numbingly vast majority of “anomalous” activity is actually benign. Furthermore, highly targeted attacks quite often are, by design, made to blend in with “normal” activity.

4 Principles


Most of the discussion heretofore has been targeted at people making high level decisions about R&D. Now, I’ll provide some more concrete principles that can be applied by people actually implementing machine learning for Incident Detection. They are as follows:


  • Use Machine Learning for Complex Relationships
  • Serve the Analyst
  • Features are the Most Important
  • Use the Right Algorithm


Use Machine Learning for Complex Relationships (with Many Variables)


When should you use Machine Learning instead of other traditional approaches such as signature matching or simple thresholds? When you have to combine many variables in a complex manner to provide reliable detections. Why?

Traditional methods work very well for detection mechanisms based on a small number of features. For example, skilled analysts often combine two low fidelity conditions into one high fidelity condition using correlation engines or complex rule definitions. I’ve seen this done manually with three or more variables, but it gets real ugly really quickly as the number of variables increases, especially when each dimension is more complex than a simple binary division.

On the other hand, machines, if properly designed, function very well with high dimensional models. Computers are adept at analyzing complex relationships in n-dimensional space.
Why not use machine learning for low dimensional analysis? Because it’s usually an unnecessary complication. Furthermore, humans are usually more accurate than machines at dealing with the low dimensional case because they are able to add contextual knowledge often not directly derivable from a training set.

Serve the Analyst


Any advanced detection mechanism must serve the analyst. It will fail otherwise. By serving the analyst, I mean empowering and magnifying the efforts of the analyst. The human should ultimately be the master of the tool. To me it seems ridiculous, but there are actually people, including a lot of researchers, that believe (or purport to believe) that tools such as IDS should (and can) be made to house all the intelligence of the system and that the roles of humans is merely to service and vet alerts. This is ridiculous. This is so backwards, that I can’t even believe some people seriously believe this. It’s sad to see it play out in practice. Much like airport security, which has gotten out of hand with increasingly intrusive screening that provides little to no value, I have to question the motives of the people pushing this mindset. Is it even possible for them to believe this is the right way to go? Are they just ignorant and reckless? Maybe it just comes down to greed or gross self-interest. Regardless of the reason, this mindset is broken.
Toggling back to the positive side, machine learning has a great potential to empower analysis. Advanced data mining, including machine learning, should be used not only to aid that analyst is automating detections but also in understanding and visualizing previous attack data so that new detections can be created.
It is vital that the analyst understand how any machine learning mechanisms work under the hood. For example, an expert should understand and review the models generated by the machine so that the expert can provide a sanity check and so that the human can understand the significance of the patterns the machine identifies. One of the coolest parts of Rohan’s PhD thesis is that he uncovered many pertinent patterns in the data, such most targeted job classes. In addition, as the accuracy of the classifier begins to wane over time, it is the expert analyst who will be able to recommend the appropriate changes to the system, such new features to be included in analysis.
Part of empowering the analysts is giving the analyst the data needed to understand any alerts or detections. Any alert should be accompanied with a method of determining what activity triggered the alert and why the activity is thought to be malicious. Many machine learning mechanisms fail because they don’t do this well. They will tell an operator that they think something may be bad, but can’t or won’t tell the operator why, let alone providing sufficient context, making the operator’s job of vetting the alert that much harder. Incidentally, if the machine learning based detection mechanism provides adequate context, it lowers the cost and pain of validating false positives, lessening their adverse impact on operations.

For an advanced detection mechanisms to have success in an operational environment, it must be made with the goal of serving the expert analyst. I believe much of the “Symantec Gap” described by Sommer and Paxson arises from ignoring this principle.

Features are the Most Important


The most important thing to consider when applying machine learning to computer security is feature selection. Remember the 2007 financial system meltdown? The author of much of the software that “facilitated” the meltdown, wrote an article describing his work and how it was abused by reckless investment banks. Glossing over the details (which are very different), the high level misuse case is often the same as cases of abuse of machine learning: People hope that by putting low value meat scraps into some abstract and complicated meat grinder of a machine they get some output that is better than the ingredients put in. It’s a very appealing idea. If one can turn things you don’t want to look at into hot dogs or sausage by running it through a meat grinder, why can’t we turn it into steak with a really big, complex meat grinder? Machine learning mechanisms can be very good at targeting specific and complex patterns in data, but at the end of the day, GIGO still applies.

Expressiveness of Features


The most important part of using machine learning for IDS is to ensure that the machine is trained with features that expose attributes that are useful for discriminating individual observations. A classic example from the world of NIDS is the inadequacy of network monitoring tools that operate at layer 3 or layer 4 to detect layer 7 (or deeper) attacks. When I get on the network payload analysis soapbox (which I often do) one of my favorite examples is as follows:

Image in you have an open email relay that sends your organization two emails. Both are about the same size, both contain an attachment of the same type, and both contain content relevant to your organization. One is a highly targeted malicious email, the other is benign.

Can you discriminate the between the two based on netflow? Not a chance. There is nothing about the layer 3 or layer 4 data that is malicious. Remember, the malicious content is the attachment, not anything done at the network layer by the unwitting relay. It doesn’t matter how many features you extract from netflow or how much you processes it, you’re not going to be able to make a meaningful and reliable differentiation.

It’s crucial when using machine learning as a detection mechanism that you have some level of confidence that the features can actually be used to draw meaningful conclusions. The straightforward way to do this is to have analysts identify low fidelity indicators that when combined in complex ways, will yield meaningful results. Sure, some data mining may be involved here, and the process may be iterative, but you’ve got to have expressive and meaningful features. In my estimation, the biggest contribution Rohan makes with his study is demonstrating the value of features that most other mechanisms ignore (and incidentally, are harder for attackers to change).

Disparate Data Sources as a Red Herring


One claim made in support of machine learning is that with machine learning, you can correlate disparate data sources. This is really a red herring. You don’t necessarily need machine learning to do this. I’ve seen traditional SIMS, processing a wide variety of data feeds, used to make really impressive detections based on analyst crafted rules that aren’t particularly complex, in and of themselves, but which require a lot of work and technological horsepower behind the scenes because they leverage data from multiple sources. Sure, machine learning facilitates use of complex relationships in data, but those relationships don’t necessarily have to be from disparate data sources.

That being said, machine learning can be wildly successful at leveraging complex relationships within disparate data sources. Rohan’s PhD work demonstrates this fabulously. One temptation, however, is to try to unnaturally “enrich” data, often consisting of inadequate features to begin with, by joining yet other features. The hope is to improve the quality of the models generated. This is all fine and well if the data joined provides some utility in classification. Also, for most machine learning techniques, if the all classes in the training data set are adequately represented and the training set has adequate entropy, no serious harm can be done by joining features with no value in improving classification. However, if some classes are under-represented (as is often the case with the “bad” examples) or if the training data doesn’t have adequate entropy (as is often the case with artificial data), “enriching” data with other data sources can incorrectly improve measures of statistical significance and performance of the machine learner in a way that wouldn’t apply to real world data. Returning to our example of the email which can’t be detected with netflow data, let’s assume the benign email is sent by the relay with an ephemeral source port of 36865 and the malicious email is sent with a source port of 36866. Now let’s say that the researcher wants to “enrich” his data by adding all sort of lookups based on the layer 3 and layer 4 parameters such as geoip lookups, etc. If the researcher joins IANA assigned port numbers into the mix, the machine’s model will discover that the benign email was sent with at source port of “kastenxpipe” and the malicious email has a source port of “unassigned”. The spurious conclusion is clear: malicious emails sent through ignorant relays originate from “unassigned” source ports. This example is contrived, but this sort of things actually occurs.

By far the most important thing to get right when applying machine learning to the field of incident detection is operating on meaningful features.

Use the Right Algorithm (but don’t fret about it)


One aspect of applying machine to incident detection is choosing the right algorithm. This is also the one aspect that is usually belabored the most in academia, especially in research that is farthest from being applicable to real world problems. There are a lot of religious battles that go on in this realm also. However, very little of this provides real world value.

My suggestion is to choose the algorithm or one of the set of algorithms that makes sense for your data and how your system is going to operate. Don’t fret too much about it. I think of this selection much like a choosing a cryptographic algorithm. The primary factor in doing this is choosing the type of cryptographic function: hash, digital signature, block cipher, complete secure channel, etc. To a large degree, it probably doesn’t matter if you choose SSL, SSH, or IPSEC for use as a secure channel. Sure, there may be some small factors or even external factors may make one slightly more desirable, but at the end of the day, any from the palette of choices will likely provide you an adequately secure channel, all other things being equal.

Also, similar to making choices for crypto systems, you should avoid inventing or rolling your own unless you have a compelling reason to do so and you know what you are doing. All too often, I see exotic and home-grown machine learning techniques applied to information security. Often I see ROC charts, figures on performance, and other convoluted diagrams justifying these sorts of things. Just like with crypto, I think it’s appropriate to hold researchers to a high burden of proof to demonstrate the real world benefit of any “bleeding edge” machine learning mechanisms being applied to incident detection.

Again, Rohan’s PhD work is exemplary of the principles I’m trying to express. He chose a machine learning mechanism that fit his data and use cases well. While he did spend a fair amount of time and efforts trying to tweak the classifier (see cost sensitive stuff), this had marginal benefit. He provides few suggestions for future work in improving the machine learning mechanisms. However, he recommends, and I agree with his recommendation, that the overall system could be improved by exposing more relevant features (such as file attachment metadata) and tightening outcome classes by separating the “bad” in classification into multiple groupings based on similarity of attacks.

With that high level principle out of the way, I’ll say a little about specific classes of mechanisms or specific algorithms. In doing so I’ll express a few biases and religious beliefs that aren’t backed with the same level of objectivity contained in the rest of this essay.

Random Forests


I love Random Forests. Lots of other people do too. Random Forests works well with numerical data as well as other data types like categorical data. While Random Forests may not be the most simple example, tree based classification mechanisms are very easy to understand and once a classifier is trained, insanely efficient to classify new observations. The algorithm takes care of identifying variable importance and tuning the classifier accordingly. Many other mechanisms can only do part of all of this, require a large amount of manual tuning, require manual data normalization, etc. Random Forests is easy and works very well in many situations.

Text Based Mechanisms


Text based mechanisms are all the rage. They are awesome for helping make sense of human to human communication. For example, bayesian algorithms used in SPAM filtering mechanisms are actually rather effective at identifying and filtering high fidelity SPAM based on the text intended for human consumption. Document clustering mechanisms are very effective at weeding through large corpuses of documents, identifying those about similar topics. There is a huge amount of contemporary research on and new whiz bang mechanisms related to text mining, natural language processing, etc.

For the part of information assurance that requires operating on human to human communication, text based machine learning mechanisms hold high potential. However, most communication of interest in incident detection isn’t human to human, but is computer to computer. A large portion of computer to computer communication is done through exchange of numerical data. However, it is somewhat humorous to see researchers attempt to apply text classification mechanisms to predominately numerical data, such as network sensor data. While there may be legitimate reasons to do this, I see these efforts with the same cynical doubts concerning longevity with which I regard efforts to vectorize logical problems into problems suitable for floating point operations so GPUs can be leveraged.

R: Freedom in Stats and Machine Learning


One tool that I have to give a quick shout out to is R. Many people call R the free version of S (S is a popular stats tool), just like people say Linux is the free version of Unix. It’s a pretty close analogy. R is not only free as in beer, but is very free as in speech. There’s a huge and growing community supporting it. People who like Linux, Perl, and the CLI will love R. One thing I like about R is that everything you do is done via commands. Those commands are stored in a history, just like bash. If you want to automate something you’ve done manually, all you do is turn your R history into an R script. It’s easy to process stats, create graphs, or run machine learning algorithms without ever touching a GUI. It is much like Latex in that it has a steep learning curve, but people who master it are usually happy with the things they can do with it.

Conclusion


I hope that in the future there will be a greater measure of success in applying machine learning to incident detection. I hope those funding and directing research will help ensure a greater measure of relevancy by providing researchers with the data and problems necessary to conduct relevant research. I also hope that the principles I’ve laid out will be useful for people other than myself in helping to guide research in the future.

Thursday, October 14, 2010

Touching Number 1

While visiting Oak Ridge, I was given the opportunity to not only see, but also touch the #1 ranked supercomputer in the world. This blackberry snapped photo shows just enough detail to capture the nerdy nirvana of the event:


Amazing!

Saturday, September 18, 2010

Are Targeted Attacks on Industry Cyberwar?

I’m writing this post to try to enter the conversation on cyberwar, etc. My motivation in doing so is not only to share my opinions on the topic, but also to add my witness to the few others out there which testify that targeted attacks pose a real and extant threat to our long term national prosperity.

Before I start, I need to clarify my viewpoint. I’m a technical person. I do technical work--like programming computers. I don’t have any political, social, or economic influence. I do have a lot of operational experience doing incident response, especially against highly sophisticated attacks. However, since my current and past employers and universities don’t allow me to speak about specifics of attacks; I can only cite general observations and trends. I stand very little to gain from the comments I’ll be making. My primary goal is to help shape public opinion.

Targeted, Persistent Attacks


Throughout this article, I’ll be speaking about highly targeted, persistent attacks perpetrated by well organized attack groups for the apparent purpose of stealing sensitive information including trade secrets. Many people use the term Advanced Persistent Threat (APT) to describe this category of attackers. Some people use it to describe some specific subset (which they often imply isn’t a strict subset) of this attack class, and as such, use it as a proper noun. Even though many imply some coherent rationale for their grouping, they usually won’t elucidate in public. I tend to use terms like targeted attacks and persistent attackers to ensure people understand I’m talking about the general attack class. That being said, the vast majority of what has been said by people in the know about APT applies to what I’ll be saying, regardless of whether you consider APT a general attack class or specific attack group. Just to be explicit, examples of APT discussions that I believe to be on the mark are those by Mike Cloppert and Richard Bejtlich. On the other hand, examples of wantonly ignorant discussions about APT include those by Mcaffe and Damballa. One quick litmus test is that if someone supposedly discussing APT closely relates the activity to botnets, identity theft, or insider threat, they’re not talking about the same thing I am.

Most of my discussion will focus around highly targeted attacks for the purpose of compromising sensitive information, especially against industry. I’ll intentionally avoid speculating on important issues such as the ability of terrorists to use vulnerable computer systems to cause mass disruption and destruction. The one thing I will say is that there are a lot of projections about how information systems could be exploited for malicious intent. Many of these are still hypothetical. APT attacks are real today and are becoming more prevalent as time passes.

Attacks on Industry


One of the most disturbing aspects of highly targeted and persistent attacks is that these attacks are becoming more common against private industry. Governments have always had to worry about spies breaking into their systems, and have supposedly been developing systems to counter APT level threats for some time. Private industry isn’t used to having to defend against APT class attacks. Companies like Google are being taken off guard. These highly targeted attacks are resulting in information being compromised that normally isn’t--things like trade secrets and proprietary information. This is really scary. The perpetrators aren’t going after credit cards or SSNs, they’re going after trade secrets. Many people consider this sort of information one of the most valuable classes of assets in the US economy. The use of this information by competitors represents a serious threat to the long term prosperity of any information based company, and by extension, the competitiveness of the US economy. This is real scary. Even the military types recognize the risk. I think it demonstrates some serious means/ends inversion, but when military types start talking about threats to US prosperity inhibiting our ability to conduct war, we ought to listen. We need to remember that self defense is merely a means to an end of freedom, peace, and prosperity. Highly targeted attacks don’t just endanger short term national security; they are a serious threat to the US’s long term peace and prosperity. Throughout this post, I’m going to be focusing primarily on attacks against industry.

Cyberwar?


Are targeted, persistent attackers waging cyberwar? This is a hard question. First, modern society has confounded the meaning of war, using it for things like “Cold War”, “War on Terror”, and even “War on Christmas”. It’s hard to clearly define what warfare is.

Clearly, cyber- (e.g. something related to computers or networking) is used pervasively in modern warfare. Militaries have driven many of the developments in technology and communication that are now used by civilians. The military uses computers, networks, and robots extensively to conduct warfare. While using cyber- in this context probably lines up with other prefixes such as modern- (e.g. using gunpowder) and chemical-, this doesn’t comprise all of what most people mean when they say cyberwar, including the US military.

The US military has applied a much broader meaning to cyberwar: defining it a battle space or domain much like land, air and sea. I’m not sure I fully agree with the rationale behind this definition, but it’s theirs to make. However, using this definition, targeted, persistent attacks with the apparent goal of collection of sensitive information, doesn’t line up with cyberwar, because no disruption occurs. Using US government parlance, this activity is probably better categorized as cyber-espionage.

Cyber-Espionage?


If persistent, targeted attacks seeking sensitive information aren’t classed as warfare, maybe they are appropriately classed as cyber-espionage. Recently, Gen. Michael Hayden spoke at Blackhat on this very subject. What he said seems to be basically in line with the rest of what the US government has said on these topics. His basic assertion was that intelligence gathering isn’t cyberwar. He basically said that attacks targeting sensitive information like what I’ve been speaking of are just part of business as usual, at least for cyber-spies. He expounds the partitioning of the cyber domain into 3 sub-domains: CND (defense--stopping the other two), CNE (exploitation--for espionage), and CNA (attack--for disruption or destruction). A lot of what he said makes sense, as he dispels a lot of FUD. At the very least, most of what he said is technically correct.

Information as the End


A couple months ago, I would have agreed with this categorization of APT attacks as cyber-espionage. Then I listened to this podcast. Something Rob Lee said struck a cord with me. He said, in short, that information is an asset over which modern wars are being fought, much like the riches of land or gold in previous centuries. I’d never thought of information as the end of warfare, simply as the means. I think this way of looking at targeted attacks warrants more discussion. What if cyberwar isn’t just about aggressors using IT as a means to conduct warfare? What if the purpose of cyberwar is to rest highly valuable information away from the enemy, just like land or gold in traditional warfare? This isn’t information warfare, because the information targeted is not necessarily about warfare. Attacks targeting industry trade secrets aren’t espionage by most people’s definition because the secrets being taken aren’t military or political in nature--they are largely economic. This is essentially economic espionage.

Cyber-Piracy?


It’s a shame that people in industry have used the term piracy for actions that are more equitable to petty theft. If it wasn’t already used, cyber-piracy seems like a good way to describe the theft of sensitive information of economic value using military-like force. That’s really what’s happening to industry now. Persistent attackers are forcibly stealing highly valuable trade secrets. One of the reasons I’d like to compare this to naval piracy is that it must be perpetrated by a military-like force and because it is usually best answered with military or para-military force. I can visualize trade secrets being exfiltrated by hackers as gold or other goods being carried off by pirates in ships. The value of the data lost due to targeted attacks is immensely high, but is not normally discussed and it is easy to conceal. Regardless, if the value of the data stolen from private industry through targeted attacks was known, it would probably be considered a justifiable reason to wage a war against the perpetrators.

On Attribution


One thing that many people seem to get preoccupied with is the issue of attribution for highly targeted attacks. Many facets of these attacks make it very unlikely that the attacks are perpetrated merely by organized crime without some level of support or tolerance by national governments. For example, highly persistent attackers usually target information that is not highly liquid and as such could only be of value to a small set of possible markets. Are these attacks directly sponsored, indirectly guided, or loosely condoned by foreign nations? Most of us will never know that answer. For most people, it really doesn’t matter. The actions that should be taken to solve the targeted attack problem don’t change that much regardless of how much foreign government support is behind these attacks. Lay people should be pushing for diplomatic, legal, and possibly military pressure to stop them.

China


Numerous open sources have implicated China in targeted attacks. My favorites include the NG report on PRC cyber-warfare and CNE and Shadows in the Clouds. The attacks on Google earlier this year and the subsequent response by Google is probably the best known public example. The most compelling evidence of Chinese involvement is that Chinese human rights activists were targeted by these attacks. It is hard to imagine anyone other than a Chinese supporter having adequate motivation to conduct this sort of attack. Of course, this doesn’t mean that the attacks are perpetrated by agents of the Chinese government. Indeed, the Chinese government often claims that they are victims of hacking themselves. Clearly the Chinese government has other high priority issues to address, such as ensuring that the constitutionally granted right to free speech is protected.

That being said, I think the focus on China is a little myopic. I find it hard to believe that all targeted attacks on industry are from one source. Even if they are, how long it will stay that way?

It Takes Two to Fight


As mentioned previously, the extent of the damage caused by targeted persistent attacks is probably great enough to justify a war. If there’s one element missing from cyberwar, it’s our response. I’ve heard the terms cyber-Pearl Harbor and cyber-9/11 bandied about, but up to this point, there has not been a single decisive attack and associated response that even comes close to earning these titles. I doubt such an event will ever occur associated with targeted attacks on industry. Sure, terrorists and the like may well perpetrate an event that might earn an appellation of cyber-9/11. Terrorists intentionally perpetrate highly visible and dramatic attacks, but APT attacks are exactly the opposite: they are stealthy and deceptively mundane in methods. Unlike terrorists, whose goal is to gain attention, targeted, persistent attackers seem to prefer keeping things quiet. To make matters worse, most of the victims of these attacks like to keep their losses secret also. In the past, I’ve discussed how keeping targeted attacks secret stifles the development of technical solutions.

From everything I can tell, the US is not fighting back to protect industry from targeted, persistent cyber attacks. The military is trying hash out their internal turf wars about who will own the cyber domain. Beyond that, the US government is still trying to figure out who, if anyone, is going to help defend industry against cyber threats. Based on the recent reports of a huge breach in the government’s classified networks, it appears the government and military is struggling to defend its own networks. While DHS claims to have a division dedicated to cyber security, it appears that they are not concerned about the theft of trade secrets from industry, preferring to focus their efforts on protecting critical infrastructure from attacks like those terrorists would like to be able to perpetrate. Defending industry from targeted attacks is not a battle anyone is openly fighting, even though industry is getting roughed up.

Cyberwar?, Cyber Espionage?


Returning to the title of this post, do targeted attacks on industry constitute cyberwar? Probably not, especially if there is no reciprocation. Is it espionage? Not really, at least not according most peoples’ definition, because the data targeted isn’t directly related to the government but is largely economic in nature. If I were going to put targeted, persistent attacks on industry under a single moniker, I’d label them as “Economic Espionage”.

A major motivation in writing this post is to voice my concern about a very serious threat to our long term prosperity and to add my voice to the others claiming that these attacks are real: they are happening today at an alarming rate. I normally don’t like doing it this way, but I’ve pointed out a serious problem without providing any suggestions for remedying it. I hope to provide my thoughts on what needs to be done in a future post. Targeted attacks on industry are real. They pose a serious threat to our long term prosperity.

Thursday, August 26, 2010

Vortex Howto Series: Demo VM Image

(Updated 10/16/2010) Doug Burks just informed me that he's included vortex in his Security Onion liveCD. See comments. In many ways, this is probably a superior way to kick the wheels on vortex because if you run it on real hardware with multiple cores, you can actually see the benefits of parallelism. You can also easily and directly compare vortex to full IDS platforms like Snort or Bro as well as other smaller utilities like tcpick (vortex hopefully providing some value add somewhere). Note that Security Onion Live doesn't include libBSF, but most people don't use that extensively anyway. I gave Security Onion Live a quick test drive and highly recommend it. The VM image below will remain available for (slow) download in the event anyone finds it useful.



In order to make vortex, especially my vortex howto series, more accessible, I've created a vmware image. The image is a basic install of centos with all the prerequisites for the vortex howto series installed, including the html instruction for offline reading. Only the small pcaps are included, but scripts that download the other data sets are included.

The intent is to make basic demonstration of vortex very easy. It's as easy as I dare make it. I've tested the content from installments 1 and 2, which were very easy to execute. Unfortunately, installments 3, and especially installment 4, are difficult to demonstrate in VM due to the small number of processor cores, use of 32-bit for portability, etc.

The image can be downloaded here. Please excuse the slow download rates. See the included README for more details.

One errata item I've already noticed is that to install the defcon data set using the script provided, you'll need to install ctorrent. Ex. sudo yum install ctorrent. Also, I seemed to have trouble using mergcap to create the whole 7 GB pcap file for defcon. It fails at the 2GB mark, but this amount of data should be adequate for demonstration purposes anyway.

Nergal uncovers another cool 'sploit

I'm really happy to see that Rafal Wojtczuk has gotten a fair amount of press, including a mention on slashdot, for his recent disclosure of a vulnerability allowing execution of code with root privileges. It's not the first of this sort for him and hopefully not the last.

Rafal is the primary developer and maintainer of libnids, the library on which vortex is based. My only contact with Rafal was a short email thread seeking help with libnids: he was most helpful.

Go Nergal!

Monday, July 12, 2010

Reflections on Sans 4n6 and IR summit

I was really pleased with how the Sans 4n6 and IR Summit turned out. More than anything else, it was a great opportunity to network with and hear from some of the thought leaders in 4n6 and IR. Coming from a team that has a lot of experience with IR, especially APT, I probably gained more from side conversations than anything else. I was really impressed with the heavy focus on APT, and the surprisingly on point discussions about APT. Rob Lee did a great job organizing this.

Being primarily focused on IR tool development, I was happy with the high amount of respect SW developers were given. More than once, the point was made that you need really smart people creating capabilities if your (really smart) analysts are to have a chance to keep up with APT. When I romanticize my work, I fancy myself as Q, equipping our 00* analysts with the best armaments out there. Normally SW engineers are second only to end users when it comes to abuse by security folk. Overall, there was very limited bashing on end users, and even less bashing on SW engineers. I think this demonstrates the level understanding of APT at the summit including the realization that persistent attackers are best dealt with through a threat focused response, or as Mike Cloppert has so effectively expressed: security intelligence.

I was impressed with the amount of discussion on community involvement at the conference, from technical folk volunteering to help local law enforcement to the quiescent response to APT by the federal government. In fact, in my mind, the best slides of the summit should be awarded to Richard Bejtlich concerning what the US gov. should do in response to APT. If you want a discomfort chuckle, they’re definitely worth the click.

For those who haven’t found it yet, the slides are here.

Wednesday, June 23, 2010

Flushing out Leaky Taps

Updated 03/17/2012: This article is now deprecated. Please see the revamped version: http://smusec.blogspot.com/2012/03/flushing-out-leaky-taps-v2.html.

Many organizations rely heavily on their network monitoring tools. Network monitoring tools that operate on passive taps are often assumed to have complete network visibility. While most network monitoring tools provide stats on the packets dropped internally, most don’t tell you how many packets were lost externally to the appliance. I suspect that very few organizations do an in depth verification of the completeness of tapped data nor quantify the amount of loss that occurs in their tapping infrastructure before packets arrive at network monitoring tools. Since I’ve seen very little discussion on the topic, this post will focus on techniques and tools for detecting and measuring tapping issues.

Impact of Leaky Taps


How many packets does your tapping infrastructure drop before ever reaching your network monitoring devices? How do you know?

I’ve seen too many environments where tapping problems have caused network monitoring tools to provide incorrect or incomplete results. Often these issues last for months or years without being discovered, if ever. Making decisions or relying on bad data is never good. Many public packet traces also include the type of visibility issues I will discuss.

One thing to keep in mind when worrying about loss due to tapping is that you should probably solve, or at least quantify, any packet loss inside your network monitoring devices before you worry about packet loss in the taps. You need to have strong confidence in the accuracy of your network monitoring devices before you use data from them to debug loss by your taps. Remember, in most network monitoring systems there are multiple places where packet loss is reported. For example, using tcpdump on Linux, you have the dropped packets reported by tcpdump and the packets dropped by the network interface (ifconfig).

I’m not going to discuss in detail the many things that can go wrong in getting packets from your network to a network monitoring tool. For a quick overview on different strategies for tapping, I’d recommend this article by the argus guys. I will focus largely on the resulting symptoms and how to detect, and to some degree, quantify them. I’m going to focus on two very common cases: low volume packet loss and unidirectional (simplex) visibility.

Low volume packet loss is common in many tapping infrastructures, from span ports up to high end regenerative tapping devices. I feel that many people wrongly assume that taps either work 100% or not at all. In practice, it is common for tapping infrastructures to drop some packets such that your network monitoring device never even gets the chance to inspect them. Many public packet traces include this type of loss. Very often this loss isn’t even recognized, let alone quantified.

The impact of this loss depends on what you are trying to do. If you are collecting netflow, then the impact probably isn’t too bad since you’re looking at summaries anyway. You’ll have slightly incorrect packet and byte counts, but overall the impact is going to be small. Since most flows contain many packets, totally missing a flow is unlikely. If you’re doing signature matching IDS, such as snort, then the impact is probably very small, unless you win the lottery and the packet dropped by your taps is the one containing the attack you want to detect. Again, stats are in your favor here. Most packet based IDSs are pretty tolerant of packet loss. However, if you are doing comprehensive deep payload analysis, the impact can be pretty severe. Let’s say you have a system that collects and/or analyzes all payload objects of certain type--it could be anything from emails to multi-media files. If you loose just one packet used to transfer part of the payload object, you can impact your ability to effectively analyze that payload object. If you have to ignore or discard the whole payload object, the impact of a single lost packet can be significantly multiplied in that many packets worth of data can’t be analyzed.

Another common problem is unidirectional visibility. There are sites and organizations that do asymmetric routing such they actually intend to tap and monitor unidirectional flows. Obviously, this discussion only applies to situations where one intends to tap a bi-directional link but only ends up analyzing one direction. One notorious example of a public data set suffering from this issue is the 2009 Inter-Service Academy Cyber Defense Competition.

Unidirectional capture is common, for example, when using regenerative taps which split tapped traffic into two links based on direction but only one directional link makes it into the monitoring device. Most netflow systems are actually designed to operate well on simplex links so the adverse affect is that you only get data on one direction. Simple packet based inspection works fine, but more advanced, and usually rare, rules or operations using both directions obviously won’t work. Multi-packet payload inspection may still be possible on the visible direction, but it often requires severe assumptions to be made about reassembly, opening the door to classic IDS evasion. As such, some deep payload analysis systems, including vortex and others based on libnids, just won’t work on unidirectional data. Simplex visibility is usually pretty easy to detect and deal with, but it often goes undetected because most networking monitoring equipment functions well without full duplex data.

External Verification


Probably the best strategy for verifying network tapping infrastructure is to perform some sort of comparison of data collected passively with data collected inline. This could be comparing packet counts on routers or end devices to packet counts on a network monitoring device. For higher order verification, you should do something like compare higher order network transaction logs from an inline or end device against passively collected transaction logs. For example, you could compare IIS or Apache webserver logs to HTTP transaction logs collected by an IDS such as Bro or Suricata. These verification techniques are often difficult. You’ve got to try to deal with issues such as clock synchronization and offsets (caused by buffers in tapping infrastructure or IDS devices), differences in the data sources/logs used for concordance, etc. This is not trivial, but often can be done.

Usually the biggest barrier to external verification of tapping infrastructure is the lack of any comprehensive external data source. Many people rely on passive collection devices for their primary and authoritative network monitoring. Often times, there just isn’t another data source to which you can compare your passive network monitoring tools.

One tactic I’ve used to prove loss in taps is to use two sets of taps such that packets must traverse both taps. If one tap sees a packet traverse the network and another tap doesn’t, and both monitoring tools claim 0 packet loss, you know you’ve got a problem. I’ve actually seen situations where one network monitoring device didn’t see some packets and the other network monitoring devices didn’t see some packets, but the missing packets from the two traces didn’t overlap.

Inferring Tapping Issues


While not easy and necessarily not as precise nor as complete as comparing to external data, using network monitoring tools to infer visibility gaps in the data they are seeing is possible. Many network protocols, namely TCP, provide mechanisms specifically designed to ensure reliable transport of data. Unlike an endpoint, a passive observer can’t simply ask for a retransmission when a packet is lost. However, a passive observer can use the mechanisms the endpoints use to infer if it missed packets passed between endpoints. For example, if Alice sends a packet to Bob which the passive observer Eve doesn’t see, but Bob acknowledges receipt with Alice and Eve sees the acknowledgement, Eve can infer that she missed a packet.

Data and Tools


To keep the examples simple and easily comparable, I’ve created 3 pcaps. The full pcap contains all the packets from a HTTP download of the ASCII “Alice in Wonderland” from Project Gutenburg. The loss pcap, is the same except that one packet, packet 50, was removed. The half pcap is the same as the full pcap, but only contains the packets going to the server, without the packets going to the client.

For tools, I’ll be using argus and tshark to infer packet loss in the tap. Argus is a network flow monitoring tool. Tshark is the CLI version of the ever popular wireshark. Since deep payload analysis systems are often greatly affected by packet loss, I’ll explain how the two types of packet loss affect vortex.

Low Volume Loss in Taps


Detecting and quantifying low volume loss can be difficult. The most effective tool I’ve found for measuring this is tshark, especially the tcp analysis lost segment flag.

Note that this easily identifies the lost packet at postion 50:


$ tshark -r alice_full.pcap -R tcp.analysis.lost_segment
$ tshark -r alice_loss.pcap -R tcp.analysis.lost_segment
50 0.410502 152.46.7.81 -> 66.173.221.158 TCP [TCP Previous segment lost] [TCP segment of a reassembled PDU]


I’ve created a simple (but inefficient) script that can be used on many pcaps. Since tshark doesn’t release memory, you’ll need to use pcap slices smaller than the amount of memory in your system. The script is as follows:


#!/bin/bash

while read file
do
total=`tcpdump -r $file -nn "tcp" 2>/dev/null | wc -l`
errors=`tshark -r $file -R tcp.analysis.lost_segment | wc -l`
percent=`echo $errors $total | awk '{ print $1*100/$2 }'`
bandwidth=`capinfos $file | grep "bits/s" | awk '{ print $3" "$4 }'`
echo "$file: $percent% $bandwidth "
done


Updated 02/21/2011: Most people will want to use "tcp.analysis.ack_lost_segment" instead of "tcp.analysis.lost_segment". See bottom of post for details.

It is operated by piping it a list of pcap files. For example, here are the results from the slices of the defcon17 Capture the Flag packet captures:


$ ls ctf_dc17.pcap0* | calc_tcp_packet_loss.sh
ctf_dc17.pcap000: 0.44235% 34751.40 bits/s
ctf_dc17.pcap001: 0.584816% 210957.26 bits/s
ctf_dc17.pcap002: 0.615856% 173889.57 bits/s
ctf_dc17.pcap003: 0.51238% 165425.21 bits/s
ctf_dc17.pcap004: 0.343817% 253283.86 bits/s
...


Note that I haven’t done any sort of serious analysis of this data set. I assume there were some packets lost, but don’t know for sure. I’m just inferring. Also, assuming there are some packets missing, I will never know if this was a tapping issue, network monitoring/packet capture issue, or both.

In the case of low volume loss in taps, netflow isn’t always the most useful.


$ argus -X -r alice_full.pcap -w full.argus
$ ra -r full.argus -n -s stime flgs saddr sport daddr dport spkts dpkts loss
10:12:54.474330 e 66.173.221.158.55812 152.46.7.81.80 87 121 0
$ argus -X -r alice_loss.pcap -w loss.argus
$ ra -r loss.argus -n -s stime flgs saddr sport daddr dport spkts dpkts loss
10:12:54.474330 e 66.173.221.158.55812 152.46.7.81.80 87 120 0


Note that there is one less dpkt (destination packet). Other than the packet counts, there is no way to know that packet loss occurred. I’d swear I’ve seen other cases where argus actually gave an indication of packet loss in either the loss count or the flags, but that’s definitely not occurring here. Note loss in most network flow monitoring tools refers to packets lost by the network itself (observed by retransmission) not loss in the taps which has to be inferred.

Vortex basically gives up on trying to reassemble a TCP stream if there is a packet that is lost and the TCP window is exceeded. The stream gets truncated at the first hole and the stream remains in limbo until it idles out or vortex closes.


$ vortex -r alice_full.pcap -e -t full
Couldn't set capture thread priority!
full/tcp-1-1276956774-1276956775-c-168169-66.173.221.158:55812s152.46.7.81:80
full/tcp-1-1276956774-1276956775-c-168169-66.173.221.158:55812c152.46.7.81:80
VORTEX_ERRORS TOTAL: 0 IP_SIZE: 0 IP_FRAG: 0 IP_HDR: 0 IP_SRCRT: 0 TCP_LIMIT: 0 TCP_HDR: 0 TCP_QUE: 0 TCP_FLAGS: 0 UDP_ALL: 0 SCAN_ALL: 0 VTX_RING: 0 OTHER: 0
VORTEX_STATS PCAP_RECV: 0 PCAP_DROP: 0 VTX_BYTES: 168169 VTX_EST: 1 VTX_WAIT: 0 VTX_CLOSE_TOT: 1 VTX_CLOSE: 1 VTX_LIMIT: 0 VTX_POLL: 0 VTX_TIMOUT: 0 VTX_IDLE: 0 VTX_RST: 0 VTX_EXIT: 0 VTX_BSF: 0

$ vortex -r alice_loss.pcap -e -t loss
Couldn't set capture thread priority!
loss/tcp-1-1276956774-1276956774-e-31056-66.173.221.158:55812s152.46.7.81:80
loss/tcp-1-1276956774-1276956774-e-31056-66.173.221.158:55812c152.46.7.81:80
VORTEX_ERRORS TOTAL: 2 IP_SIZE: 0 IP_FRAG: 0 IP_HDR: 0 IP_SRCRT: 0 TCP_LIMIT: 0 TCP_HDR: 0 TCP_QUE: 2 TCP_FLAGS: 0 UDP_ALL: 0 SCAN_ALL: 0 VTX_RING: 0 OTHER: 0
Hint--TCP_QUEUE: Investigate possible packet loss (if PCAP_LOSS is 0 check ifconfig for RX dropped).
VORTEX_STATS PCAP_RECV: 0 PCAP_DROP: 0 VTX_BYTES: 31056 VTX_EST: 1 VTX_WAIT: 0 VTX_CLOSE_TOT: 1 VTX_CLOSE: 0 VTX_LIMIT: 0 VTX_POLL: 0 VTX_TIMOUT: 0 VTX_IDLE: 0 VTX_RST: 0 VTX_EXIT: 1 VTX_BSF: 0


Note that there are fewer bytes collected, vortex warns about packet loss, there are TCP_QUEUE errors, and the stream doesn’t close cleanly in the loss pcap.

Simplex Capture


Simplex Capture is actually pretty simple to identify. It’s only problematic because many tools don’t warn you if it is occurring, so you often don’t even know it is happening. The straightforward approach is to use netflow and look for flows with packets in only one direction.


$ argus -X -r alice_half.pcap -w half.argus
$ ra -r half.argus -n -s stime flgs saddr sport daddr dport spkts dpkts loss
10:12:54.474330 e 66.173.221.158.55812 152.46.7.81.80 87 0 0


This couldn’t be more clear. There are only packets in one direction. If you use a really small flow record interval, you’ll want to do some flow aggregation to ensure you will get packets from both directions in a given flow record. Note that argus by default creates bidirectional flow records. If your netflow system does unidirectional flow records, you need to do a little more work like associating the two unidirectional flows and making sure both sides exist.

You could also use tshark or tcpdump and see that for a given connection, you only see packets in one direction.

Vortex handles simplex network traffic in a straightforward, albeit somewhat lackluster manner--it just ignores it. LibNIDS, on which vortex is based, is designed to overcome NIDS TCP evasion techniques through exactly mirroring the functionality of TCP stack but assumes full visibility (no packet loss) to do so. If it doesn’t see both sides of a TCP handshake, it won’t follow the stream because a full handshake hasn’t occurred. As such the use of vortex on the half pcap is rather uneventful:


$ vortex -r alice_half.pcap -e -t half
Couldn't set capture thread priority!
VORTEX_ERRORS TOTAL: 0 IP_SIZE: 0 IP_FRAG: 0 IP_HDR: 0 IP_SRCRT: 0 TCP_LIMIT: 0 TCP_HDR: 0 TCP_QUE: 0 TCP_FLAGS: 0 UDP_ALL: 0 SCAN_ALL: 0 VTX_RING: 0 OTHER: 0
VORTEX_STATS PCAP_RECV: 0 PCAP_DROP: 0 VTX_BYTES: 0 VTX_EST: 0 VTX_WAIT: 0 VTX_CLOSE_TOT: 0 VTX_CLOSE: 0 VTX_LIMIT: 0 VTX_POLL: 0 VTX_TIMOUT: 0 VTX_IDLE: 0 VTX_RST: 0 VTX_EXIT: 0 VTX_BSF: 0


The most optimistic observer will point out that at least vortex makes it clear when you don’t have full duplex traffic--because you see nothing.

Conclusion


I hope the above is helpful to others who rely on passive network monitoring tools. I’ve discussed the two most prevalent tapping issues I’ve seen personally. One topic I’ve intentionally avoided because it’s hard to discuss and debug is interleaving of aggregated taps, especially issues with timing. For example, assume you do some amount of tap aggregation, especially aggregation of simplex flows, either using an external tap aggregator or bonded interfaces inside your network monitoring system. If enough buffering occurs, it may be possible for packets from each simplex flow to be interleaved incorrectly. For example, a SYN-ACK, may end up in front of the corresponding SYN. There are other subtle tapping issues, but the two I discussed above are by far the most prevalent problems I’ve seen. Verifying or quantifying the loss in your tapping infrastructure once is above and beyond what many organizations do. If you rely heavily on the validity of your data, you may consider doing this periodically or automatically so you detect any changes or failures.


Updated 02/21/2011: I need to clarify and correct the discussion about low volume packet loss. The point of this post was to talk about packet loss in tapping infrastructure--packets that are successfully transferred through network, but which don’t make it to passive monitoring equipment. This is actually pretty common in low end tapping equipment, such as span ports of switches or routers. My intention was not to talk about normal packet loss that occurs in networks, usually due to network congestion. I messed up. I have two versions of the below script floating around--one that measures packets “missed” by the network monitor and one that measures total packets “lost” on the network. I used the wrong one above.

Let me explain more. When I say “missed” I mean packets that traversed the network being monitored, but didn’t make it to the monitor device. Ex. they were lost during tapping/capture. When I say “lost” packets, I mean packets that the monitor device anticipated, but didn’t see for whatever reason. They could be dropped on the network (i.e. congestion) or could be dropped in the tapping/capture process. One really cool feature of tshark is that you can easily differentiate between the two. The tcp.analysis.ack_lost_segment filter matches all packets which ACK a packet (or packets) which were not seen in the packet trace. The official description is: “ACKed Lost Packet (This frame ACKs a lost segment)”. While your monitor device didn’t see the ACK’d packets, the other endpoint in the communications presumably did because it sent an ACK. The implications of this are that you can infer with strong confidence that the absent packets were actually transferred through the network but were “missed” by your capture. This feature of tshark is the best way I’ve found to identify packet loss that is occurring in passive network tapping devices or in network monitors which isn’t reported in the normal places in network sensors (pcap dropped, ifconfig dropped, ethtool -S). In normal networks with properly functioning passive monitoring devices “ack_lost_segment” should be zero.

On the other hand, the mechanism which I mistakenly demonstrated below calculates packets lost for any reason, usually either congestion on the network being monitored or deficiencies in networking monitoring equipment. The description of tcp.analysis.lost_segment is: “Previous Segment Lost (A segment before this one was lost from the capture). For the purposes of verifying the accuracy of your network monitoring equipment, any loss due to congestion is a red herring. While this mechanism certainly does report packets “missed” by your network monitoring equipment, it will also report those “lost” for any other reason. I keep this version of the script around to look at things like loss due to congestion. It may well be useful for passively studying where loss due to congestion is occurring such as you might do if you are studying buffer bloat. In networks subject to normal congestion, “lost_segment” should be non-zero.

Please excuse this mistake. I try hard to keep my technical blog posts strictly correct, very often providing real examples.




Updated 07/07/2011: György Szaniszló has proposed a fix for wireshark that ensures that all “ack_lost_segment” are actually reported as such. In older versions of tshark, there were false negatives (instances where “ack_lost_segment” should have been reported but wasn't) but no false postivies (all instances of the “ack_lost_segment” were correct). As such, with György's fix, tshark should provide more accurate numbers in the event of loss in tapping infrastructure. The old versions of tshark are still useful for confirming that you have problems with your tapping infrastructure (I've had decent success with them), but clearly are not as accurate for comprehensively quantifying all instances of loss in your taps. In his bug report, he does a great job explaining the different types of loss, which he terms “Network-side” and “Monitor-side”. He also provides an additional trace for testing.

Cloppert on Defining APT Campains

Michael Cloppert has posted another installment in his long running series on security intelligence. In his latest, Defining APT Campaigns, he discusses the how and why behind a threat focussed approach to categorizing attack activity. More importantly than the how, when combined with his previous articles in this series, he gives a clear explanation of the why.

If you are somehow responsible for responding to targeted attackers you should understand why security intelligence or a threat focussed response is so critical. This is how you consistently stop and analyze attacks before compromises occur. This is how build resilient defenses that transcend the vulnerability du jour. This is how you get a leg up on the attackers and make repeated attacks harder for them.

I have to say, when I was first exposed to security intelligence, I was a little skeptical. My thought was "that's cool we can understand so much about the attacker, but what's the point?". Well, the point is, the more visibility you have into an attack sequence, the more an attacker has to change to make the next attack successful. You can also stop attacks sooner, saving time on damage assessment and cleanup which allows you to spend more time preparing for the next attack. After seeing how effective this approach is against APT, I'm a believer. I can't count the number attacks, including 0-day exploits, that I have seen effectively mitigated because of common indicators or techniques used between attacks in the same campaign.

Lastly, he touches on the criticality of developing tools for threat focussed incident response and detection. Clearly this warms my heart.

Wednesday, June 16, 2010

Flashback to my Commodore 128

While some of my colleagues were having nightmares responding to more adobe 0-days, I've been on vacation, having pleasant flashbacks of my own.

While I don't normally indulge like this, I just couldn't pass up posting this picture, which I found going through old pictures with my grandfather. It's a picture of our newly set up Commodore 128, complete with joysticks and tractor-feed dot matrix printer. I think the picture was taken on or soon after Christmas in 1987, give or take a year.



I think all of us have sweet memories of our first computers. The commodore 128 was mine. I have great memories of playing games and using The Print Shop. It makes me laugh seeing the huge smile I had on my face then.

Saturday, May 29, 2010

Security Engineering Is Not The Solution to Targeted Attacks

Recent publicity and lessons from the school of hard knocks have significantly increased the visibility of targeted attacks. Many organizations react to targeted attacks by pouring on yet more of the traditional reactive security measures that didn’t work in the first place. Many also institute draconian rules and procedures for their users. While stepping up security infrastructure and user awareness training is often necessary, it can never completely solve the targeted attack problem, at least not without inflicting unacceptably unreasonable and probably impractical restrictions on the organization’s personnel and IT infrastructure.

There’s been a fair amount of buzz about Michal Zalewski’s article entitled Security engineering: broken promises. He does a very good job of summarizing some of the open issues with security engineering. I do think he’s probably a little pessimistic, missing some opportunities to give credit and I think it’s unfair to claim security engineering has failed for not developing a unified model that can ensure security. However, he’s pulled together a lot of different facets of security engineering in a short article. The field of security engineering does need to continue to seek to eliminate vulnerabilities that are being exploited widely, and do it in an efficient manner. Much of his discussion can be generalized beyond software security to general information security.

While Zalewski didn’t address or mention APT, I’ve heard similar (but usually not so complete or well worded) rants about the failings of security best practices in regards to APT. It really pains me hear people trash security engineering, especially in the context of Aurora and similar attacks. I’ve also heard a fair amount “sky is falling” and “security best practices can’t keep you safe from APT”. Blaming security engineering for failing to stop targeted attacks doesn’t make sense when it was never a requirement of most systems. Furthermore, we don’t want security engineering alone to solve this problem anyway.


Engineering


Engineering is about applying science to provide solutions that meet well defined parameters. These parameters involve all sorts of things like functionality, cost, reliability, etc. Many of these parameters are conflicting, at least apparently. Because we live in a world with scarce resources, engineering seeks to provide the optimum value for all the various parameters.

While security has some unique characteristics, it can be viewed as another parameter of a system. While I agree that if done right, security doesn’t have to be as painful as we often make it, security does often conflict with other parameters such as flexibility, cost, and functionality. As such, a wise engineer only invests as much effort in making a system secure as is required.

It amazes me that physics envy rages so strong in some people’s hearts and minds that they actually lose sight of the imperfections of both theoretical and applied physics. People who expect a comprehensive model to cover all aspects of security, much like the ever nebulous theory of everything, have a long time to wait, very possibly infinitely long, but even that is probably impossible to prove. Furthermore, many of the simple physics models are hard to actually apply in the real world due the many different phenomena that need to be modeled simultaneously. The massless, frictionless, point objects that we hear so much of in physics exercises must only exist in a vacuum, because I’ve never seen them. Practical application of physics isn’t as easy as the models often make it appear. That’s alright though. Using classic Newtonian physics works in a great many situations and helps me understand the world around me. Scientific models always have limitations in their applicability, but that doesn’t negate their value. While it quite often literally requires a group of rocket scientists, very often using a mix of multiple models and simulations, we’ve been able to do a great many things based on physics without having a theory of everything.


Success of Security Engineering


Formal methods are hard. Formally verifying a system is only practical on the most simple of systems. However, it has been done. Ex. flight control systems or highly secure classified systems. We refrain from formally verifying all systems we build not because we can’t or don’t know how, but because it’s just too hard. It requires too much effort and restricts the functionality and flexibility of the resulting systems too much for most people’s tastes. Most people couldn’t, or at least wouldn’t want to do their day to day work on one of these highly verified, and therefore, highly restricted systems.

While not my favorite, using risk mitigation strategies is very effective in certain circumstances. It’s useful where the risk can be quantified and accurately predicted. A prime example of this is the risk associated with identity theft. Many financial institutions effectively apply risk mitigation calculations to determine whether a given measure which will reduce losses due to identity theft will cost more to implement than just accepting the losses. As long as the losses can be accurately calculated a priori, this method is very valid.

Again, I realize that there is plenty room for improvement in the field of security engineering. Regardless, for most threats, security engineering is rather successful overall. We know how to make systems more secure than they are now, but we prefer not to. So if systems aren’t as secure as they should be, it’s usually because we didn’t design them to be secure. Sure, part of engineering is finding solutions that satisfy multiple parameters at the same time. These advances will continue to make security more compatible with ease of use and flexibility. Improved standards will continue to raise the minimum bar of security, while minimizing the additional cost of doing so. However, I believe most systems are secure enough, or at least as secure as we wanted them to be.

It should be noted that the adequacy of the security provided by most systems is not provided solely by the system itself but is supported by external factors such as legal protections. For example, the physical security of most houses is only good enough to make it difficult, well maybe even only inconvenient, for would-be burglars. The vast majority of the deterrence comes in fear of getting caught. Furthermore, insurance provides a very cost effective means of protecting your investments despite the remote risk of burglary.


Security Engineering Can’t Solve Targeted Attacks


The biggest problem with targeted attacks isn’t that security engineering couldn’t provide effective solutions. Our current systems aren’t secure enough to protect us from targeted attacks because we haven’t asked them to be that secure. Furthermore, I don’t think we want them to be that secure. Even if it was possible to make a machine that was 100% secure, I doubt it could ever be used for much of consequence while maintaining that level of security due to weaknesses in the environment, people, and processes.

Let’s return to the example of the residential physical security. Imagine if you took away the deterrence offered by law enforcement. It’s hard to imagine, but let’s say would-be attackers had basically no external deterrence and the only thing between them and your possessions in your house was you and your house. You’d have to go to some very extreme measures to keep your house secure. Simple locks and even an alarm system wouldn’t cut it. Basically in absence of any other deterrent, to defeat a rational burglar the defenses on your house would have to cost the attacker more to circumvent than the value that he could gain from sacking the house. This is a tough asymmetric situation where your defenses have to be perfect and the persistent burglar only has to find one weakness or one weak moment. He can try over and over again, as failed attempts don’t cost him much. It doesn’t take much imagination to see how living in a house like this wouldn’t be much fun.

Ok, now pretend you have something valuable to a small set of burglars. Let’s say you have something like a highly coveted recipe for cinnamon rolls. Let’s say a small set of burglars really want to make their own sweat buns instead of buying yours, and possibly sell them to your customers. The problem with this is that you can’t take out an insurance policy on your roll recipe very easily. How could you quantify the cost of exposure? How could you prove the secret was really lost if you suspect it was? How many times would insurance compensate you—only on the first loss or on all subsequent losses? Insurance just doesn’t work in this case. Insurance policies work great for easily replaceable items like televisions, cars, etc, but they just don’t work well for things like trade secrets.

Targeted attacks are much like the scenario laid out above. Sadly, there is little to no deterrence. Usually the information targeted is highly valuable, but not easily quantifiable. Lastly, while it technically would be possible to engineer defenses that would be effective, very few people really want to live the resulting vault in fort knox, let alone pay for the construction.


Alternatives to Security Engineering


So if it’s not feasible to pursue a pure engineering solution to defend against targeted attacks, what is to be done? First of all, a lot of other non-technical solutions should be pursued. I’ll refrain from discussing legal, political, diplomatic, military, etc. solutions because most of us only have a minor influence on these domains and my experience is pretty thin in these areas. However, I do think it’s clear that in many cases, non-technical solutions would be the most effective solutions to the problem. It should also be clear by the empty public statements made by many leaders and decision makers in this realm that non-technical solutions on an international scale are going to take a while, if they ever come.

Security engineering is part of the solution. In many cases, we do need to engineer more secure solutions. We need to make security cheaper and easier. However, even with the best minds on the problem, this will only help so much. While our users need to improve their resilience to social engineering, in many cases, targeted attacks are so done so well, that I couldn’t fault a user for being duped.

Previously I discussed how keeping targeted attacks secret kills R&D. In that case, I wasn’t speaking of security engineering as applied to all IT systems, but was referring to the small subset of IT infrastructure dedicated primarily to security (ex. IDS, SIMS, etc). In that post I echoed the claim of others that threat focused response or security intelligence is one of the most effective approaches to responding to targeted attacks. Surely, this incident response approach will require some engineering of tools to support this approach, in addition to the general security engineering that will come out of proper incident response. Correctly prioritizing your engineering resources to deal with targeted attacks will often result in allocation of resources to tools that support an intelligence driven response.

I often imagine that a well functioning threat focused incident response team facing targeted attacks is much like the wolf and sheepdog cartoon. While the sheep aren’t particularly well protected, and really can’t be if they are to graze successfully, the sheepdog watches for the ever present wolf. The sheepdog keeps track of the wolf and counters his efforts directly, instead of trying to remedy every possible vulnerability. I recognize that as the sheepdog is invariably successful, this comparison is a little more ideal than reality will probably ever be. However, focusing a concentrated intelligence effort on a relatively small group of highly sophisticated attackers makes a lot of sense as long as the group of advanced attackers is small and the effort to defend against them is much higher than against other vanilla threats.

I’ve done both security engineering and engineering for security intelligence. Both have their place. Both have their success stories and both have numerous opportunities for improvement. However, blaming security engineering for the impact of targeted attacks is a herring as red as they come. A world where security engineering actually tried to solve highly targeted and determined attackers would not be a fun place in which to live. In absence of other solutions, an intelligence driven incident response model is your best bet. If I haven’t been able to convince you of this, then all I have to say is that Chewbacca lives on Endor and that just doesn’t make sense… Blaming security engineering for target attacks: that does not make sense.

Thursday, May 20, 2010

Panel and Preso at SANS 4n6 and IR Summit

I’m honored to have been asked to be part of the SANS 2010 What Works in Forensics and Incident Response Summit. I’ll be part of a panel discussion on network forensics and will be presenting on the topic of “Network Payload Analysis for Advanced Persistent Threats”.

The agenda includes some presentations and panel discussions by a large number of the thought leaders in the field of incident response and digital forensics. This is an excellent opportunity to hear from those with experience responding to highly targeted attacks. I'm really looking forward to participating.

Saturday, May 1, 2010

Vortex Howto Series: Parallel NRT IDS

To fulfill all the major tasks I promised when I began this series of vortex howto articles, this installment will focus on scaling up the network analysis done with vortex in a way that leverages the highly parallel nature of modern servers. While the techniques shared in this post are applicable to all the uses of vortex demonstrated so far, it’s especially applicable to near-real time network analysis, a major goal of which is to support detections not possible with conventional IDS architectures, including high latency and/or highly computationally expensive analysis. If you are new to NRT IDS and its goals, I recommend reading about snort-nrt especially this blog post which explains why some very useful detection just can’t be done in traditional IDS architectures. As we’re going to build upon the work done in installment 3, I highly recommend reading it if you haven’t.

Many of us learned about multiprocessing and its advantages in college. In cases where you have high latency analysis, which often is caused by IO such as querying a DB, multiprocessing allows you to efficiently keep your processor(s) busy while accomplishing many high latency tasks in parallel. Traditionally, if you want to do computationally expensive tasks that can’t be done on a single processor, you have two options: use a faster processor or use multiple processors in parallel. Well, if you haven’t noticed, processor speeds haven’t increased for quite some time, but the number of processors in computers has increased fairly steadily. Therefore, as you scale up computationally expensive work on commodity hardware, your only serious choice is to parallelize. While the hard real time constraints of IPS make high latency analysis impossible and computationally expensive analysis difficult, if you are satisfied with near real-time, it’s a lot easier to efficiently leverage parallel processing.

Note that throughout this article, I’m not going to make a clear distinction between multi-threading, multi-processing, and multi-system processing. While text books make a stark differentiation, modern hardware and software somewhat blur the differences. For the purposes of this article, the distinction isn’t really important anyway.

Vortex is a platform for network analysis, but it doesn’t care if the analyzer you use is single or multi-threaded. Vortex works well either way. However, xpipes, which is distributed with vortex does make it easy to turn a single threaded analyzer into a highly parallel analyzer even if, or especially in the cases where, the analyzer is written in a language that doesn’t support threading.

Xpipes borrows much of its philosophy (and name) from xargs. Like xargs it reads a list of data items (very often filenames) from STDIN and is usually used in conjunction with a pipe, taking input from another program. While xargs takes inputs and plops them in as arguments to another program, xpipes takes inputs and divides them between multiple pipes feeding other programs. If you are in a situation where xargs works for you, then by all means, use it. Xpipes was written to be able to fit right between vortex and a vortex analyzer without modifying either, thereby maintaining the vortex interface. Xpipes spawns multiple independent instances of the analyzer program and divides traffic between the analyzers, feeding each stream to the next available analyzer. In general, xpipes is pretty efficient.

Slightly simplifying our ssdeep-n network NRT IDS from our last installment we get:


vortex -r ctf_dc17.pcap -e -t /dev/shm/ssdeep-n \
-K 600 | ./ssdeep-n.sh | logger -t ssdeep-n

To convert this to a multhreaded NRT IDS, we would do the following:

vortex -r ctf_dc17.pcap -e -t /dev/shm/ssdeep-n \
-K 600 | xpipes -P 12 -c './ssdeep-n.sh | logger -t ssdeep-n'

Now instead of a single instance of the analyzer we will have 12. Our system has 16 processors so this doesn’t fully load the system, but now a larger fraction of the total computing resources are used. Taking a look at this in top is as follows:

top - 12:56:25 up 102 days, 19:35, 4 users, load average: 17.30, 16.94, 9.
Tasks: 295 total, 7 running, 288 sleeping, 0 stopped, 0 zombie
Cpu(s): 16.5%us, 54.7%sy, 0.1%ni, 28.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%
Mem: 74175036k total, 73891608k used, 283428k free, 338324k buffers
Swap: 76218360k total, 155056k used, 76063304k free, 72417572k cached

PID VIRT RES S %CPU %MEM COMMAND
10345 66128 3492 R 22.6 0.0 ssdeep-n.sh
10346 66128 3452 S 22.3 0.0 ssdeep-n.sh
10322 66128 3456 R 21.9 0.0 ssdeep-n.sh
10336 66120 3440 R 21.9 0.0 ssdeep-n.sh
10337 66124 3464 R 21.9 0.0 ssdeep-n.sh
10343 66128 3488 R 21.9 0.0 ssdeep-n.sh
10330 66128 3464 R 21.6 0.0 ssdeep-n.sh
10342 66128 3444 S 21.6 0.0 ssdeep-n.sh
10351 66120 3476 S 21.6 0.0 ssdeep-n.sh
10326 66132 3476 S 20.9 0.0 ssdeep-n.sh
10340 66124 3448 S 20.9 0.0 ssdeep-n.sh
10329 66120 3452 S 19.9 0.0 ssdeep-n.sh
10302 350m 297m S 11.3 0.4 vortex
5 0 0 S 0.3 0.0 migration/1
32 0 0 S 0.3 0.0 migration/10

Beautiful, isn’t it?

If run to completion, the multithreaded version finishes in minutes while the single threaded version took hours.

As should be clear from the above, the -P option specifies the number of children processes to spawn. Typical values of this range from 2 to a few less than the number of processors in the system for highly computationally expensive analyzers. For high latency analyzers you can use quite a few more but there is an arbitrary limit of 1000.

One of the coolest features of xpipes is that it provides a unique identifier for each child process in the form of an environment variable. For each child process it spawns, xpipes sets the environment variable XPIPES_INDEX to an incrementing integer starting at zero. Furthermore, since the command specified is interpreted by shell, XPIPES_INDEX can be used in the command. Imagine that instead of using logger to write a log, we want to write directly to file. If you try something like:

$ vortex | xpipes -P 8 -c "analyzer > log.txt"

You would find that log file gets clobbered by multiple instances trying to write to the file at the same time. However, you could do the following:

$ vortex | xpipes -P 8 -c "analyzer > log_$XPIPES_INDEX.txt"

You’d end up with 8 log files, log_0.txt through log_7.txt which you could cat together if wanted. Similarly, if you want to lock each analyzer to a separate core, say 2-10, you could do something like the following:

$ vortex | xpipes -P 8 -c "taskset -c $[ $XPIPES_INDEX + 2 ] analyzer"

I think you get the idea. Just having a predictable identifier available to both the interpreter shell and the program opens a lot of doors.

Note that if you want to specify the command on the command line you can do so with the -c option. This can admittedly get a little tricky at times because of multiple layers of quoting etc. Alternatively, xpipes can read the command to execute from a file. For example:

$ echo 'analzyer "crazy quoted options"' > analyzer.cmd
$ vortex | xpipes -P 8 –f analyzer.cmd

That’s the basics of parallel processing for NRT IDS the vortex way. So while vortex takes care of all the real time constraints and heavy lifting of network stream reassembly, xpipes takes care of multithreading so all your analyzer has to do is analysis. While vortex’s primary goal has never been absolute performance, I have seen vortex used to perform both computationally expensive and relatively high latency analysis that would break a conventional IDS.

This largely fulfills the obligation I took on when I started this series of vortex howto articles. I hope this has been helpful to the community. I hope that someone who has read the series would be able to use vortex without too much trouble if a situation ever arose where it was the right tool for the job.

If there are other topics you would like discussed/explained, feel free to suggest a topic. For example, I’ve considered an article on tuning linux and vortex for lossless packet capture, but I think the README and error messages cover this pretty well. I’ve also considered discussing the details of the performance relevant parameters in vortex and xargs, but most of these work very well for most situations without any changes.

Again, I hope this series has will help people derive some benefit from vortex. I also want to reiterate my acknowledgments to Lockheed Martin for sharing vortex with the community as open source.