Kippo Honeypot Session Analysis using Splunk

16 March 2015

I thought it would be pretty interesting to do some session based analysis on my honeypots logs, so over the weekend I put together some searches that I would hope show some interesting trends and correlations in the data. One of the big things I wanted to accomplish was to try and find some way to identify how many sessions were bots versus how many sessions were human operators.

It's pretty tough to identify human vs bot, but, we'll try to see if anything comes to light with these searches. In this post, I'll show you the search and break out what the search does, along with the results of that search. So, let's begin...

First, let's take a look at the number of seconds the attacker's usually stay on one of our honeypots.

sourcetype=kippojson | transaction session | eval totaltime=round(duration) | stats count by totaltime

The above command will search in our sourcetype where our kippo logs are. Next, we'll run the transaction command to group events together based on their session field. We then create a new field with the eval command, called totaltime, which rounds the value of duration up/down to the nearest whole number. The duration field gets created when you run the transaction command, and if you couldn't tell, gives you the duration of each event. Lastly, we run the stats command and we look at how many times each duration was seen.

Looking at the results below, we can see that most sessions were about 4 seconds in length, with the next highest, being 9 seconds.

So, 4 seconds seem really short for a session, that's about enough time to hop on, maybe run 1 or 2 commands and then bail. Also, that search is for all sessions, so, that could mean that attackers didn't even do any search at all, they could just be harvesting credentials for later use.

Next, we'll look at the sessions when an attacker actually does enter commands on the honeypot.

The command below will again start with the kippojson search, but, this time we'll be using a subsearch. A subsearch will be run first, and the results from that will populate the outer search. The search looks in our sourcetype again, and looks for only events where a command was ran, command=*. Next, to get the sessions with commands entered, we run stats count by session | fields session. That search would give us a list of sessions and a count of 1, since it would only be seen once, so, the last part | fields session will only show us the session field and not the count field as well. The results from this subsearch will then end up looking like a giant OR statement, like...

sourcetype=kippojson (session=session1 OR session=session2 OR session=session3)

The last part of the command is basically the same as the command above, where we want to see how many events have the same duration.

sourcetype=kippojson [ search sourcetype=kippojson command=* | stats count by session | fields session ] | transaction session | eval totaltime=round(duration) | stats count by totaltime

Looking above, we can see the results aren't as drastic as those sessions without commands. Altough the duration doesn't look to be that different between the two. One thing to notice though is the number of sessions with durations in the double digits are more prevalant in this search.

Since most attackers are on for about 3 or 4 seconds, let's see how many commands are normally entered during sessions with durations under 10 seconds:

Below you can see things start the same way as the above searches, however, the second command from the end we change things up. We want to look for sessions where the totaltime field is less than 10 seconds, then we run the | stats count(command) by session command to get a count of the number of commands entered in each session.

sourcetype=kippojson [ search sourcetype=kippojson command=* | stats count by session | fields session ] | transaction session | eval totaltime=round(duration) | search totaltime<10 | stats count(command) by session

So, looks like most sessions are under 10 commands each, and only a few out of 1,741 being more than 2.

How about for sessions longer than 10 seconds?

sourcetype=kippojson [ search sourcetype=kippojson command=* | stats count by session | fields session ] | transaction session | eval totaltime=round(duration) | search totaltime>10 | stats count(command) by session

Above, we can see a huge difference in the number of commands being entered. The most being 126 (...wonder what this attacker is doing?), and the rest being well over the standard 2 or 3 seen in sessions less than 10 seconds. This doesn't really prove anything new, more time = more commands, but, it's crazy to see that difference. We could even break this down and give the number of commands for every session in 10 second increments to see when things really pick up.

How about we check out the top commands entered during sessions with durations under 10 seconds?

sourcetype=kippojson [ search sourcetype=kippojson command=* | stats count by session | fields session ] | transaction session | eval totaltime=round(duration) | search totaltime<10 | top command

Looks like the majority is uname -m which makes total sense, since the attacker would want to identify what type of box they are on. Next is help, which I'm not sure why that's up that high, since I would assume most sessions under 10 are for general recon, why would the attacker run help? Moving on, we see more recon commands (id, uname, uname -a), then we see the attacker grabbing some malware.

How about command usage over 10 seconds?

sourcetype=kippojson [ search sourcetype=kippojson command=* | stats count by session | fields session ] | transaction session | eval totaltime=round(duration) | search totaltime>10 | top command

More of the same here really, however, we see some newcomers (free, ps) along with stopping the firewall. Only one malware seen in this one, but, we've been seeing that piece for a long time and usually hits all of our boxes a few times a day.

Next, we'll look at the top countries responsible for the sessions under 10 seconds:

sourcetype=kippojson [ search sourcetype=kippojson command=* | stats count by session | fields session ] | transaction session | eval totaltime=round(duration) | search totaltime<10 | iplocation prefix=iploc_ allfields=true src_ip | top iploc_Country limit=5

In the above search, we are doing something new, which is | iplocation prefix=iploc_ allfields=true src_ip. The iplocation command does a lookup of an IP address to a database full of IP/location pairs. We also specify we want the prefix to be iploc_, so we would be left with iploc_Country, or iploc_Region, etc.

Next, top countries for sessions over 10 seconds:

sourcetype=kippojson [ search sourcetype=kippojson command=* | stats count by session | fields session ] | transaction session | eval totaltime=round(duration) | search totaltime>10 | iplocation prefix=iploc_ allfields=true src_ip | top iploc_Country limit=5

Moving on, let's check out how many sessions that are less than 10 seconds contain malware?

sourcetype=kippojson [ search sourcetype=kippojson shasum=* | stats count by session | fields session ] | transaction session | eval totaltime=round(duration) | search totaltime<10 | stats count

Next, how about the sessions above 10 seconds that contain malware?

That's a huge jump above, from sessions under 10 to sessions over 10 seconds, and the amount of sessions with malware. It kind of does, and doesn't suprise me though, since on one hand I figured sessions that short would just be quick "hop on, download malware, hop off" types of sessions, however, it seems like more are just scoping out the box before they attempt anything.

Let's look at the average time a session is when malware is involved:

sourcetype=kippojson [ search sourcetype=kippojson shasum=* | stats count by session | fields session ] | transaction session | eval totaltime=round(duration) | stats avg(totaltime)

Lastly, let's look at the average number of commands entered when a session is over 10 seconds:

sourcetype=kippojson [ search sourcetype=kippojson command=* | stats count by session | fields session ] | transaction session | eval totaltime=round(duration) | where totaltime > 10 | stats count(command) as msgcount by session | stats avg(msgcount)

And lastly, under 10 seconds:

Again, no surprises here. More time = more commands.

So, all the above mainly dealt with timing, commands entered, etc. But, what other type of information can we gather? How about what client version they are using?

sourcetype=kippojson | rex field=client "SSH-2.0-(?<client2>[^-|_]+)" | stats count by client2

The above search looks at the number of each SSH client seen by the attackers, but, we do another new thing in this search, which is rex. Rex is for search-time field extraction or string replacement and character substitution. Splunk Enterprise uses perl-compatible regular expressions (PCRE), so we can build a regex statement which grabs the values of the SSH client and only gets the part after SSH-2.0- and before another - or _. This is so we can get multiple version of the same client and group them together.

Looking at the results, we can see PuTTY wins by a long shot, followed by libssh. The one we are most interested in is OpenSSH, since that is most likely to be used by an actual user, and not a bot. The other ones can be utilized by bots relatively easy. This is also falls in line with the general idea that there are more bots than humans on these honeypots, so, at 611 sessions, and not even 1% of the total, that's a fairly accurate assumption that they are all humans.

I'm sure there are tons more to this I can do, however, I think this is a good start, and hopefully provided you with some interesting data, or maybe you picked up a new Splunk search or two. There's one other thing I'm working on, which is almost a "smoking gun" for human actors, however, gonna take a bit to get that working. I'll post an update once that is done though.