Get the latest posts delivered right to your inbox.

or subscribe via RSS with Feedly!

Triaging Large Packet Captures - Methods for Extracting & Analyzing Domains

In the recent post Triaging Large Packet Captures - 4 Key TShark Commands to Start Your Investigation, I discussed some areas to begin investigating a large packet capture. Generally, when confronted with a large PCAP with unknown behavior in it, we want to start whittling away chunks to find areas to focus our analysis on. As a general strategy it's important to understand the infrastructure used in the PCAP as well as the protocols that are being used. In this post I'll focus on examining infrastructure by extracting domains from the PCAP. I will also show how these domains can be compared against the Cisco Umbrella Popularity lists.

As previously discussed, a great starting point for extracting domains is to use the TShark command:

tshark -q -r <pcap> -z hosts

# TShark hosts output
# Host data gathered from <pcap>

Figure 1. TShark hosts sample output.

This command will produce a list of IP addresses and domains. The bulk of this data is derived from DNS responses in which a domain is resolved to an IP address. The IP addresses in the output are the resolutions for the corresponding domains in a DNS response. However, the appearance of an IP address in the output doesn’t mean that the IP address is involved in any of the conversations in the PCAP, just that it was seen as a resolution in a DNS response. For the most part, this query gets us the bulk of what we are looking for; however, there is some nuance to this, and there are other places to find additional domains. For example, the host lists may have hostnames and IPs that weren’t directly queried, but rather were the result of a CNAME response to your original query. This may happen when you query the A record for and in the response you get as the CNAME and resolves to a given IP address. In the hosts output you would only see and its corresponding IP address, with no evidence of the original lookup. Additionally, you may want to see domains that aren’t expected to resolve, such as a TXT record involved in DNS tunneling. If you want to know all the domains that were queried, you will need to extract them from the DNS queries directly.

Extracting From DNS Queries

To extract all the content from the name portion of a DNS query, use the following command:

tshark -q -r <pcap> -T fields -e

This will output all queried names from any traffic identified as DNS in the PCAP. The output can be cleaned up a bit by focusing on queries only and ignoring responses (removing the identical queries in the DNS response) by using the following:

tshark -q -r <pcap> -T fields -e -Y "dns.flags.response eq 0"

An important caveat here is that you may get some garbage output from this query if there are any malformed DNS packets or traffic misidentified as DNS. You will likely need to do a little cleaning up of the output to focus on only the unique names queried. If you are using bash it can be handy to remove empty lines with sed, sort the output, and then grab only the unique lines like so:

tshark -q -r <pcap> -T fields -e -Y "dns.flags.response eq 0" | sed '/^$/d' | sort | uniq

Additionally you can add the -c option onto uniq to get a count of occurrences then re-sort.


Figure 2. Extracting DNS query names, sorting, and counting.

Extracting from HTTP Host Headers

Another source to examine hostnames from is HTTP. It's helpful to pull in this information since you may not have the corresponding DNS traffic to these locations in your packet capture. The following command grabs the contents of the HTTP Host field, then removes blank lines and duplicates:

tshark -q -r <pcap> -T fields -e | sed '/^$/d' | sort | uniq

Figure 3. Extracting hostnames from HTTP host field.

Our main focus here has been on domains, but it's important to note that you may get IP addresses and ports in your output.

Using Cisco Umbrella List to Examine Extracted Hostnames

Once domains have been extracted from your PCAP, you can begin identifying well known domains and isolating suspicious ones. For this task I like to use the Cisco Umbrella Popularity Lists. Using the Top 1M list and a simple Python script, you can rank all the domains in your list according to their popularity in the Cisco Umbrella list. When searching for leads I like to focus on domains not found in the list. Below is the output from a Python script on our github that takes a list of domains and checks it against the Top 1M list.

*** Domains Not Found in Top 1 Million ***

*** Domains Found in Top 1 Million ***
Rank - Domain
4 -
64 -
432 -
508 -
1119 -
4933 -
8873 -
16075 -
563265 -

Figure 4. Output from comparing domains to Cisco Top 1M list.

Note that we are checking for exact matches here. If you are overwhelmed with a bunch of domains not found in the list, it may be worth searching on partial matches then seeing what's left over. Additionally, you could use the top TLD list to surface domains that have unusual TLDs.

Next Steps

In this blog post we discussed three main ways of extracting domains from a PCAP: using TShark’s host output, extracting names from DNS queries, and extracting from HTTP host headers. This will produce a fairly comprehensive view of the domains associated with your PCAP. This information can be used to surface suspicious domains as well as trim traffic associated to non-suspicious domains. In future blog posts we will discuss methods for further analyzing suspicious domains.