Hey everyone, today we're doing something different. This is going to be a joint blog post from Ethan Dodge (@__eth0) and I in which we talk about phishing defense coverage by the Alexa Top 100 domains, which will also expose the best attack vectors for phishing against these domains.

We're going to be using a combination of the new DNS reconnaissance tool DNStwist as well as some custom Python scripts to gather and analyze all the information we find, which we'll include in this post if you want to follow along or do your own research.

Overview

Here's a rundown of what we'll be doing to get all the information we need. We'll start by pulling down the Alexa Top 100 Domains, then we'll create a script to run them through a modified version of DNStwist to give us the permutated domain as well as the type of permutation (bitsquatting, Insertion, Omission, Replacement, etc.). We'll take this list and then do a host lookup of the domain to get the IP addresses hosting this domain, lastly we'll do a Whois lookup and Reverse DNS lookup on the IP we get and compare the Registrar/Pointer Record information against the domain to see if they match up.

After we have the comparison data, we'll be able to calculate what types of permutations are the most covered against attacks (meaning the original domain registered the permutated domain to possibly prevent phishing attempts), as well as the types of permutations that are least covered.

Grabbing the Data

First thing we need to do is get the Alexa Top One Million sites and just narrow that down to our scope of 100.

wget http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

Then we'll just cut that down to 100 domains.

cat top-1m.csv | awk -F ',' {'print $2'} | head -n 100 > alexatop100.txt

This will give us something that looks like this:

https://i.imgur.com/4BMakl1.png

Getting Permutations of Domains

Starting out, we'll need to use DNStwist to get a list of permutations. We went ahead and modified the original script to not print out extra information we don't need, we only wanted the type of permutation and the resulting domain of that permutation.

Here's a screenshot of the resulting script being run using google.com as an example.

https://i.imgur.com/ewhsmic.png

If you want the modified version of dnstwist we used, you can grab it here.

Now that we have our list of domains, we'll use this bash one-liner to loop through each of the domains, and then run that through our modified dnstwist and output the results into it's own file in a new directory:

while read domain; do python dnstwist.py $domain > ~/Desktop/alexatop100/$domain; done < ~/Desktop/alexatop100.txt

Running the above takes about 5 seconds and gives us a directory looking like this:

https://i.imgur.com/wLbeUqU.png

Host Lookup

Next thing on our plate is to do a host lookup on the resulting permutated domains. We want to end up with a text file containing the permutated domain, permutation type, and IP Address if valid and a string of "NXDOMAIN" if it's not a valid IP Address.

Here's the bash one-liner we used to look through all the permutations for each domain and run the host lookup on each, and then add it to a new file for future analysis.

for file in *; do python hostlookup.py $file; done

After letting the above command run for about 30 minutes or so, we're left with a directory that looks like this:

https://i.imgur.com/A5nLMyr.png

You'll notice the directory now has another 100 files appended with \_hostlookup. Inside each file, we see each permutated domain with the IP Address it resolved to.

https://i.imgur.com/dOoIYtA.png

Reverse DNS Lookup

Initially we were going to run a Reverse DNS lookup against the IP's to see what Pointer records they had, however, we thought doing a Whois lookup would be higher integrity. In any event, here's the steps we did to run a Reverse DNS lookup on all the domains and permutations.

Next we wrote a Python script to grab the pointer record that was returned from each permutation. It included the domain, IP, permutation type, the pointer record and a True or False statement depending on if the original hostname was seen in the hostname we grabbed.

Running the above script in another bash one-liner like this, for file in *; do python rdnslookup.py $file; done, we get another 100 files in our directory that are prepended with _rdns.

https://i.imgur.com/zGDQJMf.png

Inside each file we can see the resulting Pointer record and the True or False string.

https://i.imgur.com/hpTms00.png

Moving on to the Whois lookup...

Whois Lookup

The last part we need to code before we can analyze the data is the Whois lookup on all the IP Addresses we grabbed from the host lookup step.

With this part, we just want to grab the description field of the Whois info, which should tell us the company that owns that IP Address. Between the Whois and Reverse DNS lookups, we should be able to determine if the owner of the IP Address matches the permutated domain.

Now that we have our final data, which was two different sources (Whois and Reverse DNS), we can now run some statistics on this data to answer some of the questions we asked earlier in the post. First things first though, we'll need to get Splunk set up to ingest the data.

Splunk Setup

In order for Splunk to recognize the fields, we'll configure the props.conf file in /opt/splunk/etc/system/local/ with the following settings:

[phishing]
REPORT-phishing = REPORT-phishing

[whois]
REPORT-whois = REPORT-whois  

Next, we edit the transforms.conf file in /opt/splunk/etc/system/local/ like so:

[REPORT-phishing]
DELIMS = " "  
FIELDS = "domain","ip","perm_type","hostname","is_match"

[REPORT-whois]
DELIMS = " "  
FIELDS = "domain","ip","perm_type","owner","is_match"  

That's all that needs to be done in order to parse the events. Which will look something like this now:

https://i.imgur.com/dqKiCa4.png

Before We Begin

Before we get into the specific results between Whois and Reverse DNS, it'll help if we identify the different types of permutations, provided by Lenny Zeltser on his blog:

  • Bitsquatting, which anticipates a small portion of systems encountering hardware errors, resulting in the mutation of the resolved domain name by 1 bit. (e.g., xeltser.com).
  • Homoglyph, which replaces a letter in the domain name with letters that look similar (e.g., ze1tser.com).
  • Repetition, which repeats one of the letters in the domain name (e.g., zeltsser.com).
  • Transposition, which swaps two letters within the domain name (e.g., zelster.com).
  • Replacement, which replaces one of the letters in the domain name, perhaps with a letter in proximity of the original letter on the keyboard (e.g, zektser.com).
  • Ommission, which removes one of the letters from the domain name (e.g., zelser.com).
  • Insertion, which inserts a letter into the domain name (e.g., zerltser.com).

Whois Analysis

Ok, let's jump into the data! We'll start with the analysis of the whois data.

Below is a list of the top permutation types registered

sourcetype=whois | top perm_type

Permutation Type Count
Replacement 2002
Insertion 1849
Bitsquatting 1347
Omission 454
Repetition 400
Transposition 347
Homoglyph 335
Concourse 313
Subdomain 193
Hyphenation 146

Alright, out of all the registered domains, how many permutated domains are potentially registered by the original domain owner?

sourcetype=whois is_match=true | stats count

Out of all the domains registered, according to our unvetted data, there are only 460 domains registered by the original domain owner.

Now, let's see the permutation type protected against the most.

Permutation Type Count
Insertion 146
Replacement 130
Bitsquatting 53
Repetition 41
Omission 35
Transposition 23
Homoglyph 19
Hyphenation 11
Subdomain 2

Out of all the Insertion permutated domains, let's identify the domains that are protected the most:

sourcetype=whois is_match=true perm_type="Insertion" | rex field=source "\/tmp\/(?<original_domain>[^_]+)"| top original_domain

Domain Count
amazon.com 29
microsoft.com 28
booking.com 26
amazon.co.uk 25
yahoo.com 15
amazon.in 9
netflix.com 7
wikipedia.org 2
yandex.ru 1
msn.com 1

Now, let's just do the most protected domains regardless of permutation:

sourcetype=whois is_match=true | rex field=source "\/tmp\/(?<original_domain>[^_]+)"| top original_domain

Domain Count
amazon.com 82
amazon.co.uk 63
microsoft.com 62
booking.com 55
amazon.in 54
yahoo.com 44
netflix.com 29
bing.com 17
apple.com 10
wikipedia.org 6

Looks like Amazon is the most concerned about someone ripping off their domain :)

Reverse DNS Analysis

Alright, let's move onto Reverse DNS Analysis.

Let's get the most common permutation types registered.

sourcetype=phishing | top perm_type

Permutation Type Count
Replacement 1163
Insertion 1025
Bitsquatting 796
Omission 236
Repetition 227
Homoglyph 203
Transposition 194
Subdomain 98
Hyphenation 95

Alright, out of all the registered domains, how many permutated domains are registered by the original domain owner?

sourcetype=phishing is_match=true | stats count

Out of all the domains registered, according to our unvetted data, there are only 381 domains registered by the original domain owner.

Now, let's see the permutation type protected against the most.

Permutation Type Count
Insertion 114
Replacement 108
Bitsquatting 47
Repetition 32
Omission 27
Transposition 25
Homoglyph 17
Hyphenation 10
Subdomain 1

Out of all the Insertion permutated domains, let's identify the domains that are protected the most:

sourcetype=phishing is_match=true perm_type="Insertion" | rex field=source "\/tmp\/(?<original_domain>[^_]+)" | top original_domain

Domain Count
amazon.com 29
booking.com 26
amazon.co.uk 25
yahoo.com 17
amazon.in 9
netflix.com 4
yandex.ru 1
wikipedia.org 1
msn.com 1
blogspot.com 1

Now, let's just do the most protected domains regardless of permutation:

sourcetype=phishing is_match=true | rex field=source "\/tmp\/(?<original_domain>[^_]+)" | top original_domain

Domain Count
amazon.com 82
amazon.co.uk 63
yahoo.com 56
booking.com 55
amazon.in 54
netflix.com 19
bing.com 12
google.de 8
google.com 7
wikipedia.org 5

Just like the Whois data, Amazon takes the cake for most permutated domains registered. It makes sense though, since that would be a great site to phish for credentials.

DDoS Protection Sites

We knew that we would have some incorrect data, since not every domain registered would point exactly to the owner, i.e. wikipedia.com is owned by Wikimedia, thus we wouldn't count that as true.

One of the big things we noticed is the amount of domains resolving to prolexic.com, which is a DDoS protection site, which is what companies would use in order to prevent DDoS attempts on their sites (...obviously). We doubted that phishing domains and/or malicious actors would enlist the help of a DDoS protection service, since it probably costs a lot of money based on traffic. Based on this fact, we are going to count prolexic.com hits as true and see what kind of results we get then.

Let's rerun some of the original searches now...

First, how many permutated domains are protected?

sourcetype=phishing | eval ddos=if(searchmatch("hostname=*prolexic*"),"True","False") | search ddos="True" OR is_match="True" | stats count

We see that there are now 808 domains protected, instead of the original 381, big change!

Now we'll see what permutation types are protected against the most:

Permutation Type Count
Replacement 243
Insertion 211
Bitsquatting 139
Repetition 48
Transposition 45
Omission 44
Homoglyph 40
Hyphenation 19
Subdomain 19

Lastly, what domains are protected the most:

Domain Count
amazon.com 91
amazon.co.uk 66
booking.com 59
yahoo.com 58
amazon.in 56
pinterest.com 41
netflix.com 26
google.es 21
paypal.com 19
imdb.com 15

Final Thoughts

So, some interesting things came out of this research. First, the most common types of permutated domains that companies seem to register are replacement or insertion permutation techniques (netflox.com or netfliix.com). We also discovered that a majority of companies are using DDoS protection sites to register permutated domains (this isn't really a surprise, just interesting to note.). Lastly, we see that amazon.com, booking.com and yahoo.com are the most protected against potential phishing attempts.

The last note isn't that surprising again, but, it's interesting to see what companies took the time and steps to register these other domains. Amazon and Yahoo! are definitely sites I would expect to see, more so Amazon than Yahoo, however, since Yahoo has been around for a while, it makes sense.

If you're interested in checking out the data we used, you can find it here. If you have any questions or comments about this info, please feel free to reach out to us on Twitter, @brian_warehime or @__eth0.