DGA botnet domains: on false alarms in detection

Date of publication: 17/04/2015, CERT Polska

dga_icon

Domain Generation Algorithms are often used in botnets to create specially crafted domain names which point to C&C servers. The main purpose of this is to make it more difficult to block connections to these servers (for example with domain blacklists) or to protect the C&C channel (and botnet itself) from a takeover. Often domains generated this way are composed of random characters, for example:

<span class="text">gdvf5yt.pl</span>

, which appear as nonsensical, but nevertheless allow the botmaster to manage their bots. While working on detection of algorithmically generated domains (we have covered cases of their usage here and here) we have found examples of domains, which are similar in weirdness of appearance to those used in botnets, but are utilized for different – legitimate – purposes. Identification of these domains is useful in elimination of large number of false alarms in DGA botnet detection systems. In this entry we will describe how such domains are used in a non-malicious way and in a future post we will look into cases which can be seen as threats.

Dziwne zapytania programu antywirusowego

Strange requests of AV software

Firstly we will look into domains generated by McAfee GTI File Reputation system, which is used for example in their corporate AV solution or HIPS. It has been presented in detail in “DNS Noise: Measuring the Pervasiveness of Disposable Domains in Modern DNS Traffic” by Yizheng Chen et al. When a suspicious file is detected (executable, PDF, APK) and it cannot be confirmed as malicious using the local detection base, a special DNS query is sent to the reputation system. The query contains some basic information about the file and its hash, version of the McAfee system and information about the execution environment. All these are coded as domains of higher level and one of suffixes:

<span class="text">avqs.mcafee.com</span>

or

<span class="text">avts.mcafee.com</span>

is attached.
Example requests:

0.0.0.0.1.0.0.4e.135jg5e1pd7s4735ftrqweufm5.avqs.mcafee.com
0.0.0.0.1.0.0.4e.13cfus2drmdq3j8cafidezr8l6.avqs.mcafee.com
0.0.0.0.1.0.0.4e.13kqas3qjj46ttkdhastkrdsv6.avqs.mcafee.com
0.0.0.0.1.0.0.4e.13pq3hfpunqn1d51pmvbdkk5s6.avqs.mcafee.com
0.0.0.0.1.0.0.4e.13qh71bf782qb54uzz9uhdz4mq.avqs.mcafee.com

The response contains an address from 127.0.0.0/16 network, which is a coded information about the reputation of the queried file.

Blacklists

Another example of suspicious queries are those generated by DNS blacklists. However in this case it is not only a strange group of characters, but it could also be a multilevel domain name, containing words from natural language. An example of a such blacklist is Spamhaus DBL. It allows to ask about maliciousness of a domain by sending a specially crafted DNS query. For example, querying for IP address of

<span class="text">gmail.com.dbl.spamhaus.org</span>

will provide response that this domain does not exist – NXDomain error, what means that domain is not on the blacklist.

&gt;host gmail.com.dbl.spamhaus.org
Host gmail.com.dbl.spamhaus.org not found: 3(NXDOMAIN)

And when we ask about

<span class="text">iqumgmcqwuqgaaus.org.dbl.spamhaus.org</span>

we will get in response 127.0.1.2 address, what means that this domain could be spam-related.

&gt;host iqumgmcqwuqgaaus.org.dbl.spamhaus.org
iqumgmcqwuqgaaus.org.dbl.spamhaus.org has address 127.0.1.2

Another service that works in a similar manner is countries.nerd.dk. It provides information about country geolocation of hosts. The DNS query contains a host’s IP address added to the beginning. The country ISO code is encoded in the returned address. For example:

&gt;host 8.8.8.8.zz.countries.nerd.dk
8.8.8.8.zz.countries.nerd.dk has address 127.0.3.72

Taka odpowiedź oznacza, że dany adres znajduje się w Stanach Zjednoczonych (3*256+72=840, kod ISO tego kraju to 840).

Domeny IDN

This response means that queried the host is located in the USA (3*256+72=840, ISO code of the US is 840).

Internationalized Domain Dames (IDNs)

Usage of Internationalized Domain Names (IDNs) can be viewed as source of DNS lookups for strange domains. IDNs have been described in RFC 3490. Some examples:

ąćęłńóśźż.pl, która przez DNS przetwarzana jest jako xn--da9ag6e8jma6nxjsa.pl, lub łódź.pl, czyli xn--d-uga0v4h.pl. Cechą charakterystyczną takich domen jest prefiks „xn--”, który oznacza, że dana nazwa domenowa jest zakodowana przy użyciu Punycode’u. Niestety mechanizm ten wykorzystywany jest także w sposób złośliwy (o czym szerzej napiszemy w kolejnej części).

As you can see regional names are transformed into ASCII strings, which look random and someway strange. However, a distinctive feature of IDNs, the “xn--” prefix, provides information that they have been encoded using Punycode. Unfortunately this mechanism has been already used by attackers (it will be shown in the next entry).

Chrome/Chromium

Chrome and Chromium browsers are very good example of non-malicious usage of DGA. After startup they send a few DNS queries for three random-looking domains. The main purpose of this is to check whether NXDomain responses are hijacked in a particular network. A string of random characters is placed both as a TLD (see figure below),

chromium

and as a second level domain added to host’s local domain, for example: .lan or .local_example

chrome_win_local

We have seen this behavior on desktop browsers (Chrome and Chromium on Ubuntu Linux, Chrome on Windows 7) and also on mobile version (Android KitKat). In tests on Windows 7 Professional, additional LLMNR and NBNS queries have been performed, however there have not been any TLD queries.

chrome_win

The NXDomain hijack test is needed to check if some features of the omnibox (combination of the address bar and the search bar) will work properly. For example, after entering a single word into the omnibox, the browser does not know whether user requests local resource or it is a search phrase. In Chromium the search results are presented by default, but in the background local resource query is performed. If successful, a suggest message appears with redirection link. So if the NXDomain responses are hijacked, such suggestion mechanism will return incorrect results.

Network experiments and other queries for random looking Google domains

Chromium can give some other examples of queries for strange domains. The browser asks about domains in zones:

<span class="text">metric.gstatic.com</span>

,

<span class="text">metric.ipv6test.com</span>

,

<span class="text">metric.ipv6test.net</span>

. Similar requests have been reported by the authors of the previously mentioned “DNS Noise…”. These queries are probably used for some IPv6 metrics, as far as it can be told from their structure.

p5-dokjadpcjyjcq-v7cpryyjjoqzq3ry-930172-i1-v6exp3-v4.metric.gstatic.com
p5-dokjadpcjyjcq-v7cpryyjjoqzq3ry-930172-i2-v6exp3-ds.metric.gstatic.com
p5-dokjadpcjyjcq-v7cpryyjjoqzq3ry-930172-s1-v6exp3-v4.metric.gstatic.com
p5-jpcmcphku4iva-6pg7rmohcbnmcxcz-838302-i1-v6exp3-ds.metric.ipv6test.com
p5-jpcmcphku4iva-6pg7rmohcbnmcxcz-838302-i2-v6exp3-ds.metric.ipv6test.net

Different queries contain

<span class="text">pack.google.com</span>

domain. They are sent both by Chrome and Chromium. Unfortunately we cannot tell yet for sure what they are used for, however they are probably linked with update of applications.

r4---sn-o5n3-f5fe.c.pack.google.com
r3---sn-o5n3-f5fe.c.pack.google.com
r5---sn-uxap5nvoxg5-j2ie.c.pack.google.com

Content Delivery Networks

DGAs are often used to generate domain names of CDN servers, however it is not a general rule. Suffix of such domains is constant and only prefixes are generated algorithmically. Some examples:

a1294.w20.akamai.net
dnn506yrbagrg.cloudfront.net
a98dc034c7781a941eba-bac02262202668bbe918ea9fb5289cd2.r58.cf2.rackcdn.com
srv-2015-04-07-17.pixel.parsely.com

Jest to dość częsty przypadek użycia DGA do celów niezłośliwych, ponieważ wykorzystanie sieci CDN wzrasta.

This is a very common case of usage of DGAs in a non-malicious manner as utilization of CDNs increases.

Other sources of domains which could be misclassified

    Among other sources of domains which could be misclassified as DGA botnet we could point out:

    • domains with spelling mistakes:
      <span class="text">exmaple.com</span>

      or

      <span class="text">examplec.om</span>

      instead of

      <span class="text">example.com</span>
    • addresses in pseudo-TLD
      <span class="text">.onion</span>
    • many blogging platforms, where address of each blog is added as a third level domains

The last case could be used an example of how detection of DGA botnet domains using lexical features depends on natural language used. For example, domains in Polish and English respectively:

<span class="text">ma9strona</span>

and

<span class="text">my9site</span>

, which mean nearly the same, could be perceived by a classifier (human or machine) as suspicious if it uses natural language other than the one in which domain was created. Another question is if such domains could be generated by DGA. Unfortunately, the example of Matsnu botnet shows that there exist algorithms which create domains very similar to those generated directly by humans.

Summary

The examples of domains presented above are not a definitive list. Rather they provide an insight into main categories and applications of legitimate DGA. In this part we have shown their non-malicious utilization and in part two we will present examples when DNS and random looking domains can be considered as manifestations of a threat.