Identifying Malware Campaigns using Splunk and Kippo

14 March 2015

Now that Tango is officially released, I wanted to break down some of the searches that I was using in the Tango Splunk app. Maybe these can help any out who uses Splunk and Kippo, or maybe just help people understand what the app is doing.

So, with that, let's start with the searches that identify possible malware campaigns, which means multiple attackers using the same URI's, URL's, SHA's, or Filenames seen by other attackers. This could possibly indicate a large organization using many different hosts to distribute their malware, or, just several random hosts using the same malware, or using the same C2 domains.

It's incredibly useful to be able to identify similiar C2 servers, and with these servers, you can start mapping out the infrastructure being used.

The searches below are really similiar, since they all look for the same pattern, just a different field to search on. So, we'll first go over the Potential Malware Campaigns (By URL) search.

sourcetype=kippojson [search sourcetype=kippojson url=* | stats count by session | fields session ] |  transaction session | stats dc(src_ip) AS Count values(session) AS "Session" values(src_ip) as "Attacker IP"  by url | where Count > 1| rename url AS "URL" | sort - Count | fields - Count

Let's break that out to better understand what it's doing:

The line below tells Splunk which sourcetype to look in. Since we have control over the tango_input app, we're able to specify this when we send the data from the Universal Forwarder (UF) to the Indexer, and we chose to send it as kippojson.

sourcetype=kippojson

Next, we are doing a subsearch. This is basically doing a completly new search, and anything that we grab in this search will be used in the outer search. I'll explain...

So, in the search below we are specifying the sourcetype again, since this is a new search, and then doing url=* which means the url field needs to equal something. Next we are running the stats command and counting the number of times each session field occurs, then we are removing the count field by specifying we only want the session field with fields session. This subsearch will basically be creating one giant "OR" statement, something like: session=shahash1 OR session=shahash2 OR session=shahash4.

[search sourcetype=kippojson url=* | stats count by session | fields session ]

Next up, we are running the transaction command on the session field. The session command will group events into one giant event who have the same session field value. We are trying to extract the attacker's IP address, which was only present in the initial connection from the attacker and the honeypot, so we need to group the events together so all the fields will be present for all like events. After doing the transaction command, we get all the fields from every different event, which would include the attacker IP, and different environment variables and so forth.

Now that we have the attacker's IP address, we run another stats command to get the distinct count (dc) of the src_ip field (how many unique attacker IP's where seen), then renaming it to "Count". Next we run values(session) which will return the different values for the session field (all the different sessions that the URL was seen in), then renaming it to "Session". We then get the values of the attacker's IP with values(src_ip) and rename that to "Attacker IP", lastly we group them by "url". Grouping them by URL means we list the values of the fields for each URL, so in the end we'll get a table with each URL and the different values of those fields next to it. If that doesn't make sense, check out the screenshot at the end of this part. After the next "pipe" or "|", we tell Splunk to only give us the results where the number of unique Attacker IP's is greater than 1, which would make it a "campaign", or not just one-off attacks.

|  transaction session | stats dc(src_ip) AS Count values(session) AS "Session" values(src_ip) as "Attacker IP"  by url | where Count > 1

The last part of the search is mainly cosmetic. It will just rename url to "URL", then sort it in descending order based on the number of unique attackers, then removes the Count field.

rename url AS "URL" | sort - Count | fields - Count

After all is said and done, we're left with a table that looks like this:

That's all there is to it really. You can apply this same logic to other fields as well. In the Tango app, we use this same search, but, apply it to domains, file hashes, and filnames, but, you can do it to whatever else you think would be relevant.