- END #header -->

Offline packet capture analysis with C/C++ & libpcap

The Overview


At the request of one of my faithful readers in my original article on packet capture with libpcap, I decided to post a guide to offline packet capture processing. Why is this useful? Because popular packet capture programs like Wireshark or tcpdump can save captures to files that can be processed later. You can then apply your specialized code to these previously captured packets.

The Code

NOTE: This program makes use of the http.cap Wireshark packet capture sample.

#include 
#include 
#include 
#include 
#include 
#include 
#include 
 
using namespace std;
 
void packetHandler(u_char *userData, const struct pcap_pkthdr* pkthdr, const u_char* packet);
 
int main() {
        pcap_t *descr;
        char errbuf[PCAP_ERRBUF_SIZE];
 
        // open capture file for offline processing
        descr = pcap_open_offline("http.cap", errbuf);
        if (descr == NULL) {
                cout << "pcap_open_live() failed: " << errbuf << endl;
                return 1;
        }
 
        // start packet processing loop, just like live capture
        if (pcap_loop(descr, 0, packetHandler, NULL) < 0) {
                cout << "pcap_loop() failed: " << pcap_geterr(descr);
                return 1;
        }
 
        cout << "capture finished" << endl;
 
        return 0;
}
 
void packetHandler(u_char *userData, const struct pcap_pkthdr* pkthdr, const u_char* packet) {
        const struct ether_header* ethernetHeader;
        const struct ip* ipHeader;
        const struct tcphdr* tcpHeader;
        char sourceIp[INET_ADDRSTRLEN];
        char destIp[INET_ADDRSTRLEN];
        u_int sourcePort, destPort;
        u_char *data;
        int dataLength = 0;
        string dataStr = "";
 
        ethernetHeader = (struct ether_header*)packet;
        if (ntohs(ethernetHeader->ether_type) == ETHERTYPE_IP) {
                ipHeader = (struct ip*)(packet + sizeof(struct ether_header));
                inet_ntop(AF_INET, &(ipHeader->ip_src), sourceIp, INET_ADDRSTRLEN);
                inet_ntop(AF_INET, &(ipHeader->ip_dst), destIp, INET_ADDRSTRLEN);
 
                if (ipHeader->ip_p == IPPROTO_TCP) {
                        tcpHeader = (tcphdr*)(packet + sizeof(struct ether_header) + sizeof(struct ip));
                        sourcePort = ntohs(tcpHeader->source);
                        destPort = ntohs(tcpHeader->dest);
                        data = (u_char*)(packet + sizeof(struct ether_header) + sizeof(struct ip) + sizeof(struct tcphdr));
                        dataLength = pkthdr->len - (sizeof(struct ether_header) + sizeof(struct ip) + sizeof(struct tcphdr));
 
                        // convert non-printable characters, other than carriage return, line feed,
                        // or tab into periods when displayed.
                        for (int i = 0; i < dataLength; i++) {
                                if ((data[i] >= 32 && data[i] <= 126) || data[i] == 10 || data[i] == 11 || data[i] == 13) {
                                        dataStr += (char)data[i];
                                } else {
                                        dataStr += ".";
                                }
                        }
 
                        // print the results
                        cout << sourceIp << ":" << sourcePort << " -> " << destIp << ":" << destPort << endl;
                        if (dataLength > 0) {
                                cout << dataStr << endl;
                        }
                }
        }
}


The Breakdown

#include 
#include 
#include 
#include 
#include 
#include 
#include 
 
using namespace std;
 
void packetHandler(u_char *userData, const struct pcap_pkthdr* pkthdr, const u_char* packet);

These are the includes and declarations necessary for reading the packet captures. The first 2 are self explanatory, the following 5 includes might be less so. These are used for parsing and transforming data found in packets. The functions and structures included in these headers are integral to packet processing and are available natively on Linux systems (Ubuntu in this case).


    int main() {
        pcap_t *descr;
        char errbuf[PCAP_ERRBUF_SIZE];
 
        // open capture file for offline processing
        descr = pcap_open_offline("http.cap", errbuf);
        if (descr == NULL) {
                cout << "pcap_open_live() failed: " << errbuf << endl;
                return 1;
        }

After entering the main execution, we go straight to opening our target packet capture file, http.cap. To do this we use pcap_open_offline() and give it the capture filename and an error buffer as parameters. If all goes well, we get a pcap_t descriptor returned. If not, check the error buffer for details.


        // start packet processing loop, just like live capture
        if (pcap_loop(descr, 0, packetHandler, NULL) < 0) {
                cout << "pcap_loop() failed: " << pcap_geterr(descr);
                return 1;
        }
 
        cout << "capture finished" << endl;
 
        return 0;
}

Just like in a live packet capture, we use pcap_loop() to set up a handler callback for each packet to be processed. We give it the following:

  • descr – the descriptor we just created with pcap_open_offline()
  • count – 0 (zero), to indicate there is no limit to the number of packets we want to process
  • callback – The name of our packet handler function
  • userdata – NULL, to indicate that we will be passing no user defined data to the callack

When the entire file has been processed, we will print the “capture complete” message and then exit.


    void packetHandler(u_char *userData, const struct pcap_pkthdr* pkthdr, const u_char* packet) {
        const struct ether_header* ethernetHeader;
        const struct ip* ipHeader;
        const struct tcphdr* tcpHeader;
        char sourceIp[INET_ADDRSTRLEN];
        char destIp[INET_ADDRSTRLEN];
        u_int sourcePort, destPort;
        u_char *data;
        int dataLength = 0;
        string dataStr = "";

Here we define the packet handler callback, as per the libpcap specifications. For more details, check out my original post on packet capture. The following declarations define variables that will help us parse meaningful data out of the packets. These include packet header data, IP addresses, source/destination ports, and payload data.

There’s LOTS more useful information to be analyzed from the average packet. Check out the structure defined in the network includes at the beginning of the code for more details. Actually, it would probably be a hell of a lot easier to just download and fire up Wireshark. It will give you a greater appreciation for what can be learned from a packet.


        ethernetHeader = (struct ether_header*)packet;
        if (ntohs(ethernetHeader->ether_type) == ETHERTYPE_IP) {
                ipHeader = (struct ip*)(packet + sizeof(struct ether_header));
                inet_ntop(AF_INET, &(ipHeader->ip_src), sourceIp, INET_ADDRSTRLEN);
                inet_ntop(AF_INET, &(ipHeader->ip_dst), destIp, INET_ADDRSTRLEN);

I’m not going to delve to deeply into the specifics of network protocols, as that could be a post… check that… that could be a book of its own. Basically here we are parsing the ethernet header from the packet and using its type to determine if it is an IP packet or not. We use the ntohs() to convert the type from network byte order to host byte order.

If it is an IP packet, we parse out the IP header and use the inet_ntop() function to convert the IP addresses found in the IP header into a human readable format (i.e., xxx.xxx.xxx.xxx). In a lot of older examples you’ll see the use of inet_ntoa(), but this is not thread-safe and is deprecated.


               if (ipHeader->ip_p == IPPROTO_TCP) {
                        tcpHeader = (tcphdr*)(packet + sizeof(struct ether_header) + sizeof(struct ip));
                        sourcePort = ntohs(tcpHeader->source);
                        destPort = ntohs(tcpHeader->dest);
                        data = (u_char*)(packet + sizeof(struct ether_header) + sizeof(struct ip) + sizeof(struct tcphdr));
                        dataLength = pkthdr->len - (sizeof(struct ether_header) + sizeof(struct ip) + sizeof(struct tcphdr));

Similar to above, I use the IP header to determine if this is a TCP packet (they all should be since its a HTTP capture) and then parse out the TCP header. With the TCP header we can then determine the source and destination ports, with ntohs() again, and then determine the contents of the packet payload.


                       // convert non-printable characters, other than carriage return, line feed,
                        // or tab into periods when displayed.
                        for (int i = 0; i < dataLength; i++) {
                                if ((data[i] >= 32 && data[i] <= 126) || data[i] == 10 || data[i] == 11 || data[i] == 13) {
                                        dataStr += (char)data[i];
                                } else {
                                        dataStr += ".";
                                }
                        }
 
                        // print the results
                        cout << sourceIp << ":" << sourcePort << " -> " << destIp << ":" << destPort << endl;
                        if (dataLength > 0) {
                                cout << dataStr << endl;
                        }
                }
        }
}

In the final step of the packet handler we display the results of our rudimentary analysis. First we iterate through the bytes of the payload and save it in a format that is human friendly. If you try to print it out with the non-printable characters in there you will get some very messy results in your console. After this cleanup we simply output the packet data we have extracted and display it in the console.

The Summary

So now that you can process packets offline, what do you want to do with them? I don’t know about you, but aside from obvious applications to network analysis, I’d like to use this data for trending, visualization, or even generative art and sound. But then again I’m weird. What are you gonna do?

11 Responses to “Offline packet capture analysis with C/C++ & libpcap”

  1. Mina says:

    While a good exercise in programmatically using libpcap and the linux packet handling functions, the same can be accomplished with existing CLI tools:

    tcpdump -n -s 0 -w http.pcap tcp and port 80

    The file can then be read with any pcap-compatible tool, including wireshark or tcpdump itself again:
    tcpdump -r http.pcap

    While we’re at it, I’ve combined tcpdump with netcat to be able to capture packets on a device without much storage space, and forward it real-time to a better-equipped host:

    On the remote host:
    nc -l -p 9988 > http.pcap

    And on the host to sniff:
    tcpdump -n -s 0 -w – tcp and port 80 | nc remotehost:9988

  2. Awesome information Mina! I used a similar workflow back in my days as a network security analyst. You are right, this is purely an exercise and a guide to creating more elaborate projects. There’s lots of ways to build on it, including some of the ideas I mentioned in the summary.

    Storage in a database is another one that can be extremely useful, but is too expensive in heavy traffic environments to do inline. I learned that one the hard way a long time ago trying to use Snort IDS’s mysql plugin for output. Shortly after I ended up writing a custom version of barnyard to handle unified(2) output.

  3. Andy Fields says:

    Tony,

    Here’s a plug – you should check out OPNET’s APM Xpert Suite – I think you will find these COTS solutions very compelling for packet capture & analysis (real-time).

    I’m from Pittsburgh area originally and happy to pass along our local rep there if interested in a demonstration.
    http://www.opnet.com/solutions/application_performance/

    Enjoy!
    -Andy Fields
    http://www.painpoint.com

    PS: I really like your idea of generative art or music from packet streams; I’ve thought about this too and am interested if you know anyone out there that’s done this yet.

  4. Andy,

    Have you ever seen Packet Garden? http://ljudmila.org/~julian/pg/
    Its an old project and I think it might be dead, but a its cool example of generative visuals based on packet data. it creates a “planet” and terraforms it based network data.

    I was toying with the idea of writing a capture file parser in AS3 and using some of the code available at wonderfl.net to do sound generation based on the packets. Check out the profile of Keim_at_Si http://wonderfl.net/user/keim_at_Si/codes to see what I mean. It’s some pretty wild stuff.

  5. Josh says:

    That’s a really great idea about using pcaps to produce sound…I wonder how you could do that? I’m familiar with the trick of cat $something > /dev/dsp, but what if you wanted to use jack? Or maybe make some MIDI data out of it? Hmmm…

  6. Josh,

    Make it happen! I’d love to see some low level examples of churning out sound data based on packet captures. It would be awesome if you can find a way to normalize the packet data in such a way that you could identify certain types of traffic based on sounds.

    *BEEEEEEEEEEEEEEEEEEEEP* Oh, sounds like a portscan to me :)

  7. sajid says:

    hi i am getting these errors when i tried to display data please help

    invalid operands to binary + (have ‘char[100]’ and ‘int’)
    invalid operands to binary + (have ‘char[100]’ and ‘char *

    also please tell me are you displaying binary data
    thanks

  8. says:

    Nice post…

    For what those of you trying to replicate this on a Mac, note that a couple modifications are required to get it off the ground due to variations between Linux and BSD tcphdr formats (http://en.wikipedia.org/wiki/Tcphdr).

    #Orig Code Lines
    u_int sourcePort, destPort;
    sourcePort = ntohs(tcpHeader->source);
    destPort = ntohs(tcpHeader->dest);

    #Modified Code Lines
    u_short sourcePort, destPort;
    sourcePort = ntohs(tcpHeader->th_sport);
    destPort = ntohs(tcpHeader->th_dport);

    Also, I didn’t see it explicitly noted above, but for me I used the following command to compile the code. Hopefully it helps remove a roadblock for someone else.

    g++ code.c -lpcap -o code

    @claudijd

  9. PAul says:

    There is a glaring error in this code.

    ip headers are NOT fixed size. So:
    data = (u_char*)(packet + sizeof(struct ether_header) + sizeof(struct ip) + sizeof(struct tcphdr));

    Will NOT give you the TCP data segment. When I run this, the TCP data segment is offset by 12 bytes on one stream and something else on the other stream.

    You need to actually read the IP Header to get it’s length.

  10. dadaş says:

    nice comment………

  11. muman613 says:

    PAul,

    I recently discovered that the actual offset to the data is based on the tcp header.

    pData = (unsigned char*)(packet + size_ethernet + size_ip + (tcpptr->doff * 4)) ;

    This correctly adjusts the pointer to the packet data according to the tcp header ‘data offset/doff’ field.

    I hope that this is of help, although almost a year late…

    muman613