Tuesday, April 20, 2010

Keeping Targeted Attacks Secret Kills R&D

I’m really impressed with Google’s response to what has been coined Operation Aurora by others. I’m impressed for lots reasons. I’m impressed because they recognize the value of their intellectual property and when they realized that it was threatened, they took decisive actions to protect their interests. I think it’s sad that so many companies in a similar situation would be blinded by short sighted lust for the “emerging market” that they fail to protect themselves and fail to recognize that the same market is far from a fair or open. I’m impressed that when they apparently felt that the espionage was backed or at least condoned by the Chinese government, they called them out. Most of all, I’m happy they made this public.

That being said, I’m not too impressed that google, and the majority of the computer security industry for that matter, were taken off guard by these attacks. The level of sophistication and determination is not new nor is the type of data targeted. For the purpose of this article, when I refer to a targeted and sophisticated attacks I’m referring to attacks where one or more attacker groups repeatedly seeks to (re-)penetrate an organization’s computer systems for ends specific to the victim organization, typically exfiltration of sensitive information. These attacks are characterized by a high degree of knowledge of the victims, often a high degree of social engineering, adequate technical sophistication, and high degree of organization/coordination on the part of the attackers. I refrain from using the term advanced persistant threat (APT), because while it has had a fairly precise meaning among the people using the term for some time, the meaning has been blurred quite a bit of late. For the purposes of this article, the specific identities of the attackers, including affiliation or backing by nation-states, is not important. A few public reports of these sorts of attacks go back to at least the 2003-2005 timeframe, probably earlier, but that’s when I started paying attention. Maybe the one thing that is new is the type of industry targeted. I think google should have known it was coming. I’ll bet they had some warnings they chose to ignore, but I guess I can't fault them too much.

The response by the security industry to these attacks is pitiful. Many people recognize that the state of the art, including mainstream enterprise security tools, can’t stop, let alone detect, this sort of activity. While there are a few valiant incident responders who have been dealing with sophisticated targeted attacks for some time, many with a good deal of success, the security vendors have basically ignored their pleas and ideas for improved security tools. I’ve heard vendors say “You don’t want to do that” and “the market for that isn’t big enough for us to implement it”.

What has to happen for the security industry to realize they need to deal with sophisticated targeted attacks? First, organizations need to realize the value of their intellectual property. Second they need to realize that it’s at risk. I think most organizations are at this point. Third, they need to realize that conventional security wisdom, practices, and tools, won’t protect them against this, for some people new, class of attacker. Unfortunately, all too often, this epiphany only comes after personal and painful experience. Fourth, enough people need to start demanding effective solutions that vendors feel compelled to deliver them and academia recognizes the problems that need researching. Lastly, the solutions--a capable workforce, processes and practices, technology, etc need to be developed.

While there are many hindrances, one of the biggest obstacles to effectively dealing with targeted attacks is silence. While this class of attack is far from new, basically no one talks about it. While there are plenty examples of good public documentation of sophisticated attacks, ex. Businessweek E-espionage threat, NG’s report on Chinese Espionage, and Mandiant M-trends, basically no one credible steps up and confirms the validity of the data, leaving many to dismiss these reports as sensational journalism, conspiracy theories, and marketing hype. Based on solid public data, I guess I don’t blame people for questioning the reality of this threat until they experience it personally.

This code of silence related to compromises is very detrimental to solving the problem through the various available avenues: political/diplomatic, legal, and security systems including technology and people. There are a lot of legitimate reasons for not broadcasting your status as victim of a sophisticated attack and/or the type details required to help prevent future occurrences. Most of them I wouldn’t agree with, especially if everyone in the same industry/sector is in the same boat and you all know it. One of the few legitimate reasons to keep details of these attacks secret is that defending against persistent attackers is best achieved through an attacker focused or security intelligence driven approach. But how long is your threat intelligence still useful? Surely keeping specific attack data secret past a year or two doesn’t buy you much in terms of security intelligence as the most aggressive attackers change tactics and techniques more frequently than this. Hopefully it doesn't reveal too much about your capabilities either, as they need to be evolving that quickly also. Does acknowledging you’ve been attacked after your incident response is finished, or at least well under way, buy you anything in terms of threat intelligence? I don’t think so. I admire google for going public and doing something about it. I’m happy to see some public details, but more details and official acknowledgement from google would be nice. Sadly, google is right when they say they’ve already been more open that most others in the industry.

The organizations that keep targeted attacks and the details of them secret are part of the problem, or at the very least, aren’t doing everything necessary to help solve the problem. I think it’s a little hypocritical for organizations to complain about the security industry and academia not addressing this class of threat when no one will talk about the problem publicly with the requisite level of certainty and specificity.

Focusing on security R&D, there are a few things I think need to happen before the security tools industry and academia can start to address targeted attacks. The people doing R&D need to know what type of attacks are actually occurring, they need to understand the importance of a threat focused response model, and they need some decent data.

Understanding the Targeted Attack Scenario

One of the major problems with current academic and applied research is that most researchers don’t understand the basics of a highly targeted attack scenario. They don’t know how serious the problem is. If you tell an academic that the sky is falling because of targeted attacks and give them a high level overview, they’ll either yawn or laugh at you. Case in point, the following hypothetical conversation:

Boots on Ground Responder: We’ve got to do something about these highly socially engineered spear-phishing attacks!

Heads in Clouds Researcher: If you graph the social network, how many nodes away is the sender from the recipient?

Boots on Ground Responder: Uh, 1. Sometimes 2. Sometimes more, it depends.

Heads in Clouds Researcher: Ok, what about the malware? Rootkit? Polymorphism? Any Red pill/Blue pill?

Boots on Ground Responder: In this case nothing like that. Just simple malware that provides minimal backdoor. Malware isn’t even packed.

Heads in Clouds Researcher: Ok, this stuff isn’t being detected by your AV, IDS, etc but it’s still making it through firewalls, proxies, etc. Any interesting data hiding techniques?

Boots on Ground Responder: No, not really. Malware evades AV because it’s never been seen before. In cases where they need to evade our IDS, they use trivial obfuscation like ceasar ciphers. Usually though, they just hide in plain sight.

Heads in Clouds Researcher: Doesn’t sound too interesting to me. Just patch your systems and tell your users not to click on unsolicited email.

Boots on Ground Responder: Yeah, right. Still, we see repeated patterns in all of these attacks. I can’t give you details, but there’s got to be a way to catch these guys.

Heads in Clouds Researcher: Ok, well I’m going to go back to musing on the trusting trust problem…

The sad part is there are some really interesting problems, true academic problems, but for the most part, academia isn’t seeing them. I don’t think it’s because academia isn’t trying to find good problems to solve, I think it’s because the interesting details aren’t being shared.

Researchers need to learn how different targeted attacks are from opportunistic attacks. They need to understand how the goals and methods differ. They need to understand how different the targeting mechanisms are. They need to understand how valuable an intelligence driven response model is. However, they won’t learn it until someone shows them.

Supporting threat focused response

So much conventional security wisdom and basically all academic research takes a vulnerability focused approach. The focus is on detecting and mitigating individual attacks, not persistent campaigns comprising series of attacks. That’s the best approach for many classes of attacks, but isn’t the best if determined attackers continue attacking the same target over and over again. So many other people have spoken on this topic, that I’ll defer to them and steer my ramblings toward application of these principles to security tool development. For the reader’s reference, I recommend this podcast by some of the thought leaders in this realm. If what they are saying is news to you, check out their blogs, etc.

People doing security R&D have to learn about intelligence driven incident response. While some products support this approach, almost none fully embrace it. Even worse, academia is basically mute on the topic.

One aspect of a threat focused response model that is very important for security R&D is the importance of prioritization of response. While I have seen some products and research that recognizes the importance of prioritization based on the vulnerability/exploit, basically no security R&D addresses prioritization based on intelligence or attacker identity. Given the following choice, which would you rather detect/block: A stealthy rootkit installed by a botnet for the purpose of identity theft/fraud or an email containing a link to an exploit which when visited gives a sophisticated attacker user level access to the compromised computer? Most academics and many in the security industry would take the former because of impact on the system but a small group of security professional will lean hard towards the latter because of impact to the organization’s overall mission.

Another important aspect of threat focused response is relative importance of prevention and detection. For an intelligence driven response model, detection is king, and prevention is a distant second. In fact in some cases, it might actually be beneficial to not mitigate attacker activity if the attack is or will be mitigated further in the attack sequence (or kill chain) and if blocking the attack prevents collection of further threat intelligence (ex. firewall block). On the flip side, being able to detect an attack, even if it wasn’t or couldn’t be blocked, is imperative. If you look at the bigger picture, being able to block an attack is always the best, but if you can’t or didn’t detect it in real time, detecting it in near real time often almost as good. While many don’t appreciate it, being able to do historical detections, or understanding how intrusions started, including attacker activity preceding the actual attack, is also important to an intelligence driven response.

Lastly, post unsuccessful attack analysis is almost ignored by conventional tools and research. However, successful incident responders know the importance of analyzing unsuccessful attacks and developing mitigations across all facets of the attack sequence.

People doing security R&D have to learn to build features supporting threat intelligence into their tools and research.

Irrelevant Data Supports Irrelevant Research

One of the biggest hurdles to overcome for basically any sort of research is obtaining good data. The relative dearth of data related to target attacks kills research. If you were a researcher, would you choose a problem for which there is no public data? How could you? Even if you are doing more applied R&D, getting good data isn’t so easy.

There are a couple approaches to getting data for research: you can either gather the data for yourself, or you can use someone else’s data, usually a public data set. The problem with gathering the data yourself is that most researchers will never be able to gather data on targeted attacks. By their very nature, traditional computer security collection mechanisms such as honeypots, honey monkeys, etc will not normally ever see a targeted attack, definitely not a persistent campaign of target attacks. Even the researchers and vendors that do end up seeing samples representing one phase of targeted attacks, say malware, don’t see the full attack lifecycle. How can you address all phases of the attack if you only see one?

So there are good public data sets and there are some that aren’t so great, however, it seems that once a reasonably valid data set is used, it gets used over and over again. I admire folk who put together quality data sets for the community. One infamous example in the realm of incident detection is the DARPA 99 Intrustion Detection Evaluation dataset. While probably a decent data set at the time, and while memories of winnuke, etc may well be indelibly seared into the minds of some cyber war horses, these sort attacks are about as far from targeted attacks as you can get. DARPA 99 has been used and abused for a long time, but people still use it! Why? There aren’t many other options for public data sets. Other decent options for some types of research include packet captures from events like the Defcon CTF and NSA/West Point Competition, but these events are by their very nature very poor sources for persistent and highly target attacks.

While it will be necessary to develop good data sets involving targeted attacks, it’s going to be a hard effort. First, to demonstrate a persistent attacker, you need months, even years of data. As attacks have moved up the protocol stack and have become incredibly personalized, sanitizing data is going to be a lot more difficult than scrubbing IP addresses and hostnames. To truly address targeted attacks, tools will have to be configured with information about the data and people using the computer systems (not just the computer systems themselves). What that means for researchers is that to understand the significance of a target attack, you have to understand the targeted organization and targeted individuals. Lastly, as incident responders know, to be effective, data needs to be integrated from all phases of the attack and come in all sorts of formats: logs, netflow or packet captures, malware, etc. It’s clear that a perfect public data set for target attacks will never exist, but organizations can make steps by releasing older data.

While I doubt that any quality public data sets will be coming soon, organizations need to learn the value of collecting an internal data set. By nature, Incident Responders aren’t always the most disciplined at things like collecting and labeling data for historical purposes, especially considering the conditions in which they operate. Regardless, a little bit of effort to compile historical attack data for future reference, including labeling of data, pays huge dividends both in responding to future attacks and providing good training/test data for new tools.

Keeping quiet about sophisticated targeted attacks kills, among other things, intelligence driven tool R&D. For the technology to catch up with the threat, the problem needs to be discussed publicly and more details need to be shared. Publicly sharing attack information is critical to the research and development required to catch up technologically with sophisticated attacks. If the code of silence isn't broken, incident responders will continue to flounder with mainstream security tools while security tool vendors will continue to have watershed moments.


  1. What a great post. I love the Boots on Ground Responder vs Heads in Clouds Researcher part...

  2. The best (or saddest) part about that all to close to reality conversation is that if you show it to someone in security R&D, especially academia, they’ll derive the same painfully funny response that the incident responders do.