Saturday, March 10, 2012

Flushing out Leaky Taps v2

Note: this post is a re-write of a previous post, Flushing out Leaky Taps which I originally posted in June 2010.

Many organizations rely heavily on their network monitoring tools. These tools, which rely on data from network taps, are often assumed to have complete network visibility. While most network monitoring tools provide stats on the packets dropped internally, most don’t tell you how many packets were lost externally to the appliance. I suspect that very few organizations do an in depth verification of the completeness of tapped data nor quantify the amount of loss that occurs in their tapping infrastructure before packets arrive at network monitoring tools. Since I’ve seen very little clear documentation on the topic, this post will focus on techniques and tools for detecting and measuring tapping issues.

Impact of Leaky Taps


How many packets does your tapping infrastructure drop before ever reaching your network monitoring devices? How do you know?

I’ve seen too many environments where tapping problems have caused network monitoring tools to provide incorrect or incomplete results. Often these issues last for months or years without being discovered, if ever. Making decisions or relying on bad data is never good. Many public packet traces also include the type of visibility issues I will discuss.

In most instances, you need to worry about packet loss in your monitoring devices before you worry about loss in tapping. In most devices there are multiple places where loss can occur resulting in multiple places were loss is reported. For example, if running a networking monitoring application on linux, possible places were loss can occur and be reported (if it is reported) are as follows:






Loss occurringis reported at:
between Kernel and Applicationapplication (pcap) dropped
between NIC and Kernelifconfig dropped
between Tap Feed and NICethtool -S link error
between Network and Tap FeedNA

This post focuses primarily on the last item where loss is not directly observable in the network monitor. In seeking to understand loss occurring outside the network monitor, we must assume a lack of loss inside the network monitor. In other words, the methods presented here seek to identify loss comprehensively. If loss is observed or inferred, and any loss in the network monitor device can be ruled out, then loss external to the network monitor device can be identified.

I’m not going to discuss in detail the many things that can go wrong in getting packets from your network to a network monitoring tool. For a quick overview on different strategies for tapping, I’d recommend this article by the argus guys. I will focus largely on the resulting symptoms and how to detect, and to some degree, quantify them. I’m going to focus on two very common cases: low volume packet loss and unidirectional (simplex) visibility.

Low volume packet loss is common in many tapping infrastructures, from span ports up to high end regenerative tapping devices. I feel that many people wrongly assume that taps either work 100% or not at all. In practice, it is common for tapping infrastructures to drop some packets such that your network monitoring device never even gets the chance to inspect them. Many public packet traces include loss that could have been caused by the issues discussed here. Very often this loss isn’t even recognized, let alone quantified.

The impact of this loss depends on what you are trying to do. If you are collecting netflow, then the impact probably isn’t too bad since you’re looking at summaries anyway. You’ll have slightly incorrect packet and byte counts, but overall the impact is going to be small. Since most flows contain many packets, totally missing a flow is unlikely. If you’re doing signature matching IDS, such as snort, then the impact is probably very small, unless you win the lottery and the packet dropped by your taps is the one containing the attack you want to detect. Again, stats are in your favor here. Most packet based IDSs are pretty tolerant of packet loss. However, if you are doing comprehensive deep payload analysis, the impact can be pretty severe. Let’s say you have a system that collects and/or analyzes all payload objects of certain type--it could be anything from emails to multi-media files. If you loose just one packet used to transfer part of the payload object, you can impact your ability to effectively analyze that payload object. If you have to ignore or discard the whole payload object, the impact of a single lost packet can be significantly multiplied in that many packets worth of data can’t be analyzed.

Another common problem is unidirectional visibility. There are sites and organizations that do asymmetric routing such they actually intend to tap and monitor unidirectional flows. Obviously, this discussion only applies to situations where one intends to tap a bi-directional link but only ends up analyzing one direction. One notorious example of a public data set suffering from this issue is the 2009 Inter-Service Academy Cyber Defense Competition.

Unidirectional capture is common, for example, when using regenerative taps which split tapped traffic into two links based on direction but only one directional link makes it into the monitoring device. Most netflow systems are actually designed to operate well on simplex links so the adverse affect is that you only get data on one direction. Simple packet based inspection works fine, but more advanced, and usually rare, rules or operations using both directions obviously won’t work. Multi-packet payload inspection may still be possible on the visible direction, but it often requires severe assumptions to be made about reassembly, opening the door to classic IDS evasion. As such, some deep payload analysis systems, including vortex and others based on libnids, just won’t work on unidirectional data. Simplex visibility is usually pretty easy to detect and deal with, but it often goes undetected because most networking monitoring equipment functions well without full duplex data.

External Verification


Probably the best strategy for verifying network tapping infrastructure is to perform some sort of comparison of data collected passively with data collected inline. This could be comparing packet counts on routers or end devices to packet counts on a network monitoring device. For higher order verification, you should do something like compare higher order network transaction logs from an inline or end device against passively collected transaction logs. For example, you could compare IIS or Apache webserver logs to HTTP transaction logs collected by an IDS such as Bro or Suricata. These verification techniques are often difficult. You’ve got to try to deal with issues such as clock synchronization and offsets (caused by buffers in tapping infrastructure or IDS devices), differences in the data sources/logs used for concordance, etc. This is not trivial, but often can be done.

Usually the biggest barrier to external verification of tapping infrastructure is the lack of any comprehensive external data source. Many people rely on passive collection devices for their primary and authoritative network monitoring. Often times, there just isn’t another data source to which you can compare your passive network monitoring tools.

One tactic I’ve used to prove loss in taps is to use two sets of taps such that packets must traverse both taps. If one tap sees a packet traverse the network and another tap doesn’t, and both monitoring tools claim 0 packet loss, you know you’ve got a problem. I’ve actually seen situations where one network monitoring device didn’t see some packets and the other network monitoring devices didn’t see some packets, but the missing packets from the two traces didn’t overlap.

Another strategy that I've heard proposed is to use some sort of periodic heartbeat, such as as ping packet with a certain byte sequence in it, which the network monitor can then observe. If the periodic heartbeat isn't observed, then the network monitor can alert the lack of this heartbeat and the potential monitor visibility gaps can be investigated. I'm not a huge fan of this strategy. While it may work for some cases, I see many situations where alerts for visibility gaps would be caused much more frequently by conditions other than monitor visibility issues. Also, if both loss and heartbeats are a very small fraction of overall traffic, it would be possible for loss to occur without the heartbeat alert being set off. We certainly don't need another false positive, false negative prone alert to ignore.

Inferring Tapping Issues


While not easy and necessarily not as precise nor as complete as comparing to external data, using network monitoring tools to infer visibility gaps in the data they are seeing is possible. Many network protocols, namely TCP, provide mechanisms specifically designed to ensure reliable transport of data, even in the event packet drops. Unlike an endpoint, however, a passive observer can’t simply ask for a retransmission when a packet is dropped. Even so, a passive observer can use the mechanisms the endpoints use for reliable transport to infer if it missed packets passed between endpoints. For example, if Alice sends a packet to Bob which the passive observer Eve doesn’t see, but Bob acknowledges receipt with Alice and Eve sees the acknowledgement, Eve can infer that she missed a packet.

It's important to note the distinction between network packet drops and network monitor visibility gaps. When IP networks are overloaded, they drop packets. In the vast majority of cases, it's a normal (and desirable) part of flow control for networks to drop a small number of packets. Usually, when these packets are dropped, the endpoints slow communication a bit and the dropped packet is retransmitted. On the other hand, it's almost always undesirable for your network monitoring device to not see packets that are successfully transferred through the network. While a network monitor will not see the packets dropped in the network or lost due to a visibility gap, these cases are very different. The former is characterized by lack of endpoint acknowledgement and packet retransmissions while the latter usually is accompanied by endpoint acknowledgement of the "lost" packet and a lack of re-transmission. Again, this post focuses on analyzing unwanted network monitor loss, not normal network drops.

Data and Tools


For those who would like to follow along, I’ve created 3 simple pcaps. The full pcap contains all the packets from a HTTP download of the ASCII “Alice in Wonderland” from Project Gutenburg. The loss pcap, is the same except that one packet, packet 50, was removed. The half pcap is the same as the full pcap, but only contains the packets going to the server, without the packets going to the client.

In addition to these pcaps I'll also use a larger pcap of about 100MB which contains 125338 packets. This packet trace was collected on a low rate network where I have confidence that no packets were lost in the taps or the monitor (but normal network packet drops are likely). Using editcap, I've created 2 altered versions of this capture where I removed 100 and 500 packets respectively:

$ capinfos example.pcap | grep "^Number"
Number of packets: 125338
$ editcap example.pcap ex_loss_100.pcap `for i in {1..100}; do echo $[ ( $RANDOM * 125338 ) / 32767 ]; done`
$ capinfos ex_loss_100.pcap | grep "^Number"
Number of packets: 125238
$ editcap example.pcap ex_loss_500.pcap `for i in {1..500}; do echo $[ ( $RANDOM * 125338 ) / 32767 ]; done`
$ /usr/sbin/capinfos ex_loss_500.pcap | grep "^Number"
Number of packets: 124842

Note that I actually only removed 496 instead of 500 in the latter trace because some packets were randomly selected for removal twice. These packet traces will not be shared publicly, but those following along should be able to use their own capture file and obtain similar results.

For tools, I’ll be using argus and tshark to infer packet loss in the tap. Argus is a network flow monitoring tool. Tshark is the CLI version of the ever popular wireshark. Since deep payload analysis systems are often greatly affected by packet loss, I’ll explain how the two types of packet loss affect vortex.

Low Volume Loss in Taps


Detecting and quantifying low volume loss can be difficult. For a long time, the most effective tool I used for measuring this was tshark, especially the tcp analysis lost_segment and ack_lost_segment flags.

Note that this easily identifies the lost packet at postion 50:


$ tshark -r alice_full.pcap -R tcp.analysis.lost_segment
$ tshark -r alice_loss.pcap -R tcp.analysis.lost_segment
50 0.410502 152.46.7.81 -> 66.173.221.158 TCP [TCP Previous segment lost] [TCP segment of a reassembled PDU]


Unfortunately, this in and of itself doesn't help us know for sure if this packet was dropped in the network or was lost in the network monitor.

Theoretically, that's what tcp.analysis.ack_lost_segment is for.

$ tshark -r alice_full.pcap -R tcp.analysis.ack_lost_segment
$ tshark -r alice_loss.pcap -R tcp.analysis.ack_lost_segment


What's going on? This packet should have been acknowledged by the endpoint (it was originally in the trace) and the ACK for this packet was not removed. Unfortunately, this functionality doesn't always work reliably. I have seen it work some times, but as shown here, it doesn't work all the time. tshark does reliably flag packets that are lost, but doesn't reliably flag those that are lost but ACK'd. This differentiation is important because it allows us to separate packets that are lost in the network from those that lost in tapping/capture infrastructure. This bug was reported to the wireshark development team by György Szaniszló. As far as I know, the fixes that György proposed still have not been implemented. In addition to his explanation of current tshark functionality, György included an additional test pcap and a patch to fix this functionality. I recommend reading this bug report and considering using the patch he provided if you are serious about using tshark to help infer loss external to your network monitors. I'd love to see the wireshark team fix this functionality.

Please note that while the "ack_lost_segment" doesn't appear to work reliably, the "lost_segment" appears to work as expected. This can help you validate your network tapping infrastructure, especially if you can quantify the packets dropped in the network. At the very least, the loss reported here could be considered a reflection of the upper bound of the loss external to your monitor. I’ve created a simple (but inefficient) script that can be used on many pcaps. Since tshark doesn’t release memory, you’ll need to use pcap slices smaller than the amount of memory in your system. The script is as follows:


#!/bin/bash

while read file
do
total=`tcpdump -r $file -nn "tcp" 2>/dev/null | wc -l`
errors=`tshark -r $file -R tcp.analysis.lost_segment | wc -l`
percent=`echo $errors $total | awk '{ print $1*100/$2 }'`
bandwidth=`capinfos $file | grep "bits/s" | awk '{ print $3" "$4 }'`
echo "$file: $percent% $bandwidth "
done


It is operated by piping it a list of pcap files. For example, here are the results from my private example traces:


ls example.pcap ex_loss_100.pcap ex_loss_500.pcap | ./calc_tcp_loss.sh
example.pcap: 0.00261199% 40351.43 bits/s
ex_loss_100.pcap: 0.0392112% 40321.12 bits/s
ex_loss_500.pcap: 0.192323% 40194.42 bits/s
...


I believe the small amount of loss in the unmodified example.pcap resulted from normal network packet drops. Note that the loss percentage reported scales up nicely with our simulated network monitor loss.

In the case of low volume loss in taps, argus historically hasn’t been the most helpful:


$ argus -X -r alice_full.pcap -w full.argus
$ ra -r full.argus -n -s stime flgs saddr sport daddr dport spkts dpkts loss
10:12:54.474330 e 66.173.221.158.55812 152.46.7.81.80 87 121 0
$ argus -X -r alice_loss.pcap -w loss.argus
$ ra -r loss.argus -n -s stime flgs saddr sport daddr dport spkts dpkts loss
10:12:54.474330 e 66.173.221.158.55812 152.46.7.81.80 87 120 0


Note that there is one less dpkt (destination packet). Other than the packet counts, there is no way to know that packet loss occurred. For as long as I've used argus, it does a good job of identifying and quantifying normal network packet drops, as evidenced by retransmissions, etc. This is reported by flags of "s" and "d", for source and destination loss respectively, as well as various *loss stats.

Very recently, however, the argus community and developers have added mechanisms to argus to directly address inference of network monitor packet loss and differentiate it from normal network packet drops. The result are flags and counters for what is termed "gaps" in traffic visibility. This functionality is included in recent development versions (my examples are made using argus-3.0.5.10 and argus-clients-3.0.5.34). The inferred lapses in monitor visibility are denoted with the flag of "g" and the statistics "sgap" and "dgap" which measure the bytes of an inferred gap in network traffic.

For example let's look again at the alice example, regenerating it with the new argus that has gap detection capabilities and looking at the "dgap" instead of "loss" statistic:

$ /usr/local/sbin/argus -X -r alice_full.pcap -w full.argus
$ ra -r full.argus -n -s stime flgs saddr sport daddr dport spkts dpkts dgap
10:12:54.474330 e 66.173.221.158.55812 152.46.7.81.80 87 121 0
$ argus -X -r alice_loss.pcap -w loss.argus
$ ra -r loss.argus -n -s stime flgs saddr sport daddr dport spkts dpkts dgap
10:12:54.474330 e g 66.173.221.158.55812 152.46.7.81.80 87 120 1460


Now argus correctly flags the connection as having a gap in it and identifies the size of the gap. Let's see how argus does on my private example:

$ argus -X -r example.pcap -w - | ra -nn -r - | grep g | wc -l
3
$ argus -X -r ex_loss_100.pcap -w - | ra -nn -r - | grep g | wc -l
36
$ argus -X -r ex_loss_500.pcap -w - | ra -nn -r - | grep g | wc -l
102


Argus seems to scales up nicely as the number of packets lost increases. I suspect that the reason the that increase in flagged flows isn't linear is because some flow may have more than one gap in them. Note that argus can't detect every lost packet, but it does detect a large portion of them. Factors such as the ratio of reliable to unreliable protocols used, size of individual flows, number of concurrent connections, and the percentage of packets lost parametrize the ratio of lost packets to flagged flows. I think it's safe to assume that if all these parameters are held constant and a large enough number of observations are used, that one can estimate the quantity of packets lost based on the number of flows flagged with a high degree of accuracy. One could also also base estimates on the number of bytes in the sgap and dgap stats, but this also involves making assumptions (size of packets in gaps).

One concerning result of the above demonstration, however, is that there are some flows flagged as having gaps in the unmodified packet trace which I believe to be free of any network monitor loss (visibility gaps). There are a few reasons why some flows may be flagged erroneously. Probably the most common is due to a given connection being split across multiple flow records, causing a gap to be detected due to the boundary condition. This can be rectified by using longer flow status intervals so that this boundary case occurs less. Increasing the flow status interval from the default of 5 seconds to 60 seconds is enough to fix our falsely flagged flows:

$ argus -X -S 60 -r example.pcap -w - | ra -nn -r - | grep g | wc -l
0
$ argus -X -S 60 -r ex_loss_100.pcap -w - | ra -nn -r - | grep g | wc -l
25
$ argus -X -S 60 -r ex_loss_500.pcap -w - | ra -nn -r - | grep g | wc -l
64

So increasing the flow status interval is enough to have my private example report 0 flows with gaps(from which we infer network monitor loss). It also decreases the number of flagged flows in the case of legitimate loss as more connection data is bundled into a single flow record, flagged flows contain more than one lost packet. This functionality is very new to argus, so expect it to improve over time. The developers are now discussing how to improve things such as filtering, aggregation, and flow boundary conditions. One of the most exciting facets of this functionality being built into argus is that this data is available at no additional effort going forward. This makes periodic or even continuous validation of network monitor visibility extremely easy.

Updated 03/12/2012: See the comments section for results using Bro.

Vortex basically gives up on trying to reassemble a TCP stream if there is a packet that is lost and the TCP window is exceeded. The stream gets truncated at the first hole and the stream remains in limbo until it idles out or vortex closes.


$ vortex -r alice_full.pcap -e -t full
Couldn't set capture thread priority!
full/tcp-1-1276956774-1276956775-c-168169-66.173.221.158:55812s152.46.7.81:80
full/tcp-1-1276956774-1276956775-c-168169-66.173.221.158:55812c152.46.7.81:80
VORTEX_ERRORS TOTAL: 0 IP_SIZE: 0 IP_FRAG: 0 IP_HDR: 0 IP_SRCRT: 0 TCP_LIMIT: 0 TCP_HDR: 0 TCP_QUE: 0 TCP_FLAGS: 0 UDP_ALL: 0 SCAN_ALL: 0 VTX_RING: 0 OTHER: 0
VORTEX_STATS PCAP_RECV: 0 PCAP_DROP: 0 VTX_BYTES: 168169 VTX_EST: 1 VTX_WAIT: 0 VTX_CLOSE_TOT: 1 VTX_CLOSE: 1 VTX_LIMIT: 0 VTX_POLL: 0 VTX_TIMOUT: 0 VTX_IDLE: 0 VTX_RST: 0 VTX_EXIT: 0 VTX_BSF: 0

$ vortex -r alice_loss.pcap -e -t loss
Couldn't set capture thread priority!
loss/tcp-1-1276956774-1276956774-e-31056-66.173.221.158:55812s152.46.7.81:80
loss/tcp-1-1276956774-1276956774-e-31056-66.173.221.158:55812c152.46.7.81:80
VORTEX_ERRORS TOTAL: 2 IP_SIZE: 0 IP_FRAG: 0 IP_HDR: 0 IP_SRCRT: 0 TCP_LIMIT: 0 TCP_HDR: 0 TCP_QUE: 2 TCP_FLAGS: 0 UDP_ALL: 0 SCAN_ALL: 0 VTX_RING: 0 OTHER: 0
Hint--TCP_QUEUE: Investigate possible packet loss (if PCAP_LOSS is 0 check ifconfig for RX dropped).
VORTEX_STATS PCAP_RECV: 0 PCAP_DROP: 0 VTX_BYTES: 31056 VTX_EST: 1 VTX_WAIT: 0 VTX_CLOSE_TOT: 1 VTX_CLOSE: 0 VTX_LIMIT: 0 VTX_POLL: 0 VTX_TIMOUT: 0 VTX_IDLE: 0 VTX_RST: 0 VTX_EXIT: 1 VTX_BSF: 0


Note that there are fewer bytes collected, vortex warns about packet loss, there are TCP_QUEUE errors, and the stream doesn’t close cleanly in the loss pcap.

Simplex Capture


Simplex capture is actually pretty simple to identify. It’s only problematic because many tools don’t warn you if it is occurring, so you often don’t even know it is happening. The straightforward approach is to use netflow and look for flows with packets in only one direction.


$ argus -X -r alice_half.pcap -w half.argus
$ ra -r half.argus -n -s stime flgs saddr sport daddr dport spkts dpkts loss
10:12:54.474330 e 66.173.221.158.55812 152.46.7.81.80 87 0 0


This couldn’t be more clear. There are only packets in one direction. If you use a really small flow record interval, you’ll want to do some flow aggregation to ensure you will get packets from both directions in a given flow record. Note that argus by default creates bidirectional flow records. If your netflow system does unidirectional flow records, you need to do a little more work like associating the two unidirectional flows and making sure both sides exist.

You can use one of many tools, such as tcpdump or tshark, and see that for a given connection, you only see packets in one direction.

Vortex handles simplex network traffic in a straightforward, albeit somewhat lackluster manner--it just ignores it. LibNIDS, on which vortex is based, is designed to overcome NIDS TCP evasion techniques through exactly mirroring the functionality of TCP stack but assumes full visibility (no packet loss) to do so. If it doesn’t see both sides of a TCP handshake, it won’t follow the stream because a full handshake hasn’t occurred. As such the use of vortex on the half pcap is rather uneventful:


$ vortex -r alice_half.pcap -e -t half
Couldn't set capture thread priority!
VORTEX_ERRORS TOTAL: 0 IP_SIZE: 0 IP_FRAG: 0 IP_HDR: 0 IP_SRCRT: 0 TCP_LIMIT: 0 TCP_HDR: 0 TCP_QUE: 0 TCP_FLAGS: 0 UDP_ALL: 0 SCAN_ALL: 0 VTX_RING: 0 OTHER: 0
VORTEX_STATS PCAP_RECV: 0 PCAP_DROP: 0 VTX_BYTES: 0 VTX_EST: 0 VTX_WAIT: 0 VTX_CLOSE_TOT: 0 VTX_CLOSE: 0 VTX_LIMIT: 0 VTX_POLL: 0 VTX_TIMOUT: 0 VTX_IDLE: 0 VTX_RST: 0 VTX_EXIT: 0 VTX_BSF: 0


The most optimistic observer will point out that at least vortex makes it clear when you don’t have full duplex traffic--because you see nothing.

Conclusion


I hope the above is helpful to others who rely on passive network monitoring tools. I’ve discussed the two most prevalent tapping issues I’ve seen personally. One topic I’ve intentionally avoided because it’s hard to discuss and debug is interleaving of aggregated taps, especially issues with timing. For example, assume you do some amount of tap aggregation, especially aggregation of simplex flows, either using an external tap aggregater or bonded interfaces inside your network monitoring system. If enough buffering occurs, it may be possible for packets from each simplex flow to be interleaved incorrectly. For example, a SYN-ACK, may end up in front of the corresponding SYN. There are other subtle tapping issues, but the two I discussed above are by far the most prevalent problems I’ve seen. Verifying or quantifying the loss in your tapping infrastructure once is above and beyond what many organizations do. If you rely heavily on the validity of your data, you may consider doing this periodically or automatically so you detect any changes or failures. Even better, for those that heavily rely on their network monitors, building self this self-validation into the tools themselves seems like the right thing to do.

5 comments:

  1. @JustinAzoff tweeted questioning how what I've decribed in this post compares to the capture-loss policy of Bro.

    I wansn't aware of the capture-loss policy in Bro. Parenthetically, I'm excited about the recent focus on usability and documentation for Bro whose functionality is often overlooked because of these factors.

    Here are the results for Bro 2.0 using the capture-loss policy. We're primarily concerned with column 4 which is the number of gaps infered by TCP ack analysis.


    $ bro -b -r alice_full.pcap capture-loss.bro
    $ tail -n 3 capture_loss.log
    #fields ts ts_delta peer gaps acks percent_lost
    #types time interval string count count string
    1276956775.391936 0.917606 bro 0 83 0.000%
    $ bro -b -r alice_loss.pcap capture-loss.bro
    $ tail -n 1 capture_loss.log
    1276956775.391936 0.917606 bro 1 83 1.205%


    This works well on the alice example. Lets see how it works on my private example:

    $ bro -b -r example.pcap capture-loss.bro
    $ cat capture_loss.log | awk '{ if ($3 == "bro") { SUM+=$4 } } END { print SUM }'
    0
    $ bro -b -r ex_loss_100.pcap capture-loss.bro
    $ cat capture_loss.log | awk '{ if ($3 == "bro") { SUM+=$4 } } END { print SUM }'
    46
    $ bro -b -r ../ex_loss_500.pcap capture-loss.bro
    $ cat capture_loss.log | awk '{ if ($3 == "bro") { SUM+=$4 } } END { print SUM }'
    255


    In my estimation this compares very well to the results from argus above. Bro reports counts for individual gaps while argus flags on a per flow basis, but niether can reasonbly detect all packets lost due to monitor failure--again this is based on inference and only works to detect some packets.

    Also note that Bro not only creates a nice log of infered loss rates but can alert if loss rate crosses a certain threshold.

    ReplyDelete
  2. Hi Charles,

    Great post here. I wonder if you have ever encountered the case where the two directions of a TCP connection are not presented in perfect sync to the app.

    I had a lot more to say so I wrote a blog post on this subject http://www.unleashnetworks.com/blog/?p=437

    ReplyDelete
    Replies
    1. Vivek,

      Thanks for the comment and link.

      I'm not sure I've ever seen a case where I was confident that packets were reassembled out of order. I have thought about it being not only theoretically possible but I remember questioning if it was occurring in the past. The scenarios you mentioned are the situations where I've considered it to be more likely. I've always assumed this is rare to occur in practice and that it would be very hard to prove, even if it was happening. I expect most network defenders don't have to worry about this sort of thing due to tendency to tap relatively high bandwidth links relatively far away from end nodes (ex. network gateways).

      This does bring up a valid question. Is buffer bloat a problem for passive capture? Naïveté seems to dictate that when doing passive network capture, the larger the buffers, the better. Latency doesn't really matter but dropping packets is bad. However, if your capture buffers are so bloated that you end up with interleaving issues on combined taps, you obviously have a problem. It would seem the issue isn't only queue lengths, but also things like interrupt coalescing.

      That being said, I am confident that this re-ordering issue occurs much less frequently in practice than the issues I’ve discussed here. While possibly not the best sampling, this issue doesn’t seem near as prevalent in public traces as the two issues I’ve addressed in this post. I did find one public packet trace, http://jnetpcap.svn.sourceforge.net/viewvc/jnetpcap/jnetpcap/branches/branch-1.3/tests/test-http-jpeg.pcap with a relatively compelling, if not convoluted, explanation, http://jnetpcap.com/node/153. I have used wireshark to identify/confirm packet gaps with success (despite the limitations) but don’t think I’ve ever seen a case where the packets were out of order. More importantly, I have performed extensive external validation of various network monitor systems and have discovered issues that I’ve traced back to the two causes listed above. I’ve never seen significant issues that I traced back to re-ordering.

      Do you have packet traces exhibiting re-ordering that you can share? Do you have rules of thumb for when it is an issue? If so, you might consider addressing this condition in more depth. I can’t speak for the whole community, many of whom probably consider my post already pretty far out on the esoteric spectrum, but I’d be interested.

      Delete
  3. Charles,

    In my test lab with a full-duplex tap I am seeing the re-ordering issue you brought up in your Conclusion and in the comment above. I am using the Linux bonding functionality to aggregate the traffic. Are you aware of any way to make the bond deliver the traffic in the correct order? We do a similar thing with bridging on FreeBSD, and it never gets the packets out of order.

    I am using CentOS 6.3 (2.6.32 kernel) and the igb network driver and monitoring via a full-duplex tap. I'm using the Linux bonding functionality to aggregate the traffic for Snort.

    From a test client I am issuing an HTTP GET to a test server. I am seeing the following packet ordering:

    C->S SYN
    C->S ACK
    C->S HTTP GET
    S->C SYN/ACK
    S->C HTTP Response

    Because Snort is seeing the HTTP GET before the 3-way handshake is completed, Snort isn't analyzing the HTTP session.

    I agree that in most real deployments this isn't an issue due to the longer round-trip-time of the traffic being monitored. However I would like to find a way to eliminate this possibility altogether if possible.

    ReplyDelete
    Replies
    1. James,

      That’s pretty interesting that you are observing this. I’ve never been able to confirm the scenario you are seeing myself, so I can’t speak authoritatively about the best solution.

      It seems probable that your issue is caused by interrupt coalescing or new api (NAPI) style interrupt polling—probably the former. As such, one way to fix this would be to play with the interrupt coalesce (ethtool -c/-C) settings. For example, if you disabled interrupt coalescing or dropped the interrupt interval to a low enough level on your bonded interfaces, I’d expect this artifact to go away. However, on the flip side you’d be harming performance in the case of high rate packet capture. Note that the issue you’re observing isn’t solely caused by low RTT of your test lab—it’s also enabled by the low packet rate of your packet captures. Given the same RTT, if you had a very high packet rate, I’d expect this artifact to go away because the packet capture queues would have to be emptied more frequently to avoid packet loss.

      Another option would be to make your test network look more like a real network. For example, you could generate additional traffic with something like tcpreplay or iperf. It would be interesting to see how many packets per second you’d need to make this phenomenon go away. You could try to increase latency, ex. by adding a switch, but I’m not sure you’d be able to increase the latency enough to make a noticeable difference.

      I agree with your assessment. While this is probably quite an annoyance in your testing, I wouldn’t expect you to see similar issues on a production network with a large amount of traffic. If you tune your interrupt timings for your low latency, low volume test lab you’ll probably have to undo or at least modify the settings on a production network (which may well be worth the effort so you can do proper testing of your monitor configuration in your lab). Notwithstanding the annoyance, the combo of EL6, bonding, and igb should be a very solid solution—one that I think you should have reasonable confidence in once you deploy your monitor to a real network.

      Would you be willing to share/post a pcap of the phenomena you are experiencing? I’d certainly be interested in looking at the timings on the packets and I think others in the community would appreciate it too.

      Delete