Domain Generation Algorithms are often used in botnets to create specially crafted domain names which point to C&C servers. The main purpose of this is to make it more difficult to block connections to these servers (for example with domain blacklists) or to protect the C&C channel (and botnet itself) from a takeover. Often domains generated this way are composed of random characters, for example:
, which appear as nonsensical, but nevertheless allow the botmaster to manage their bots. While working on detection of algorithmically generated domains (we have covered cases of their usage here and here) we have found examples of domains, which are similar in weirdness of appearance to those used in botnets, but are utilized for different – legitimate – purposes. Identification of these domains is useful in elimination of large number of false alarms in DGA botnet detection systems. In this entry we will describe how such domains are used in a non-malicious way and in a future post we will look into cases which can be seen as threats.
Dziwne zapytania programu antywirusowego
Strange requests of AV software
Firstly we will look into domains generated by McAfee GTI File Reputation system, which is used for example in their corporate AV solution or HIPS. It has been presented in detail in “DNS Noise: Measuring the Pervasiveness of Disposable Domains in Modern DNS Traffic” by Yizheng Chen et al. When a suspicious file is detected (executable, PDF, APK) and it cannot be confirmed as malicious using the local detection base, a special DNS query is sent to the reputation system. The query contains some basic information about the file and its hash, version of the McAfee system and information about the execution environment. All these are coded as domains of higher level and one of suffixes:
The response contains an address from 127.0.0.0/16 network, which is a coded information about the reputation of the queried file.
Another example of suspicious queries are those generated by DNS blacklists. However in this case it is not only a strange group of characters, but it could also be a multilevel domain name, containing words from natural language. An example of a such blacklist is Spamhaus DBL. It allows to ask about maliciousness of a domain by sending a specially crafted DNS query. For example, querying for IP address of
will provide response that this domain does not exist – NXDomain error, what means that domain is not on the blacklist.
Host gmail.com.dbl.spamhaus.org not found: 3(NXDOMAIN)
And when we ask about
we will get in response 127.0.1.2 address, what means that this domain could be spam-related.
iqumgmcqwuqgaaus.org.dbl.spamhaus.org has address 127.0.1.2
Another service that works in a similar manner is countries.nerd.dk. It provides information about country geolocation of hosts. The DNS query contains a host’s IP address added to the beginning. The country ISO code is encoded in the returned address. For example:
18.104.22.168.zz.countries.nerd.dk has address 127.0.3.72
Taka odpowiedź oznacza, że dany adres znajduje się w Stanach Zjednoczonych (3*256+72=840, kod ISO tego kraju to 840).
This response means that queried the host is located in the USA (3*256+72=840, ISO code of the US is 840).
Internationalized Domain Dames (IDNs)
Usage of Internationalized Domain Names (IDNs) can be viewed as source of DNS lookups for strange domains. IDNs have been described in RFC 3490. Some examples:
ąćęłńóśźż.pl, która przez DNS przetwarzana jest jako
xn--d-uga0v4h.pl. Cechą charakterystyczną takich domen jest prefiks „xn--”, który oznacza, że dana nazwa domenowa jest zakodowana przy użyciu Punycode’u. Niestety mechanizm ten wykorzystywany jest także w sposób złośliwy (o czym szerzej napiszemy w kolejnej części).
As you can see regional names are transformed into ASCII strings, which look random and someway strange. However, a distinctive feature of IDNs, the “xn--” prefix, provides information that they have been encoded using Punycode. Unfortunately this mechanism has been already used by attackers (it will be shown in the next entry).
Chrome and Chromium browsers are very good example of non-malicious usage of DGA. After startup they send a few DNS queries for three random-looking domains. The main purpose of this is to check whether NXDomain responses are hijacked in a particular network. A string of random characters is placed both as a TLD (see figure below),
and as a second level domain added to host’s local domain, for example: .lan or .local_example
We have seen this behavior on desktop browsers (Chrome and Chromium on Ubuntu Linux, Chrome on Windows 7) and also on mobile version (Android KitKat). In tests on Windows 7 Professional, additional LLMNR and NBNS queries have been performed, however there have not been any TLD queries.
The NXDomain hijack test is needed to check if some features of the omnibox (combination of the address bar and the search bar) will work properly. For example, after entering a single word into the omnibox, the browser does not know whether user requests local resource or it is a search phrase. In Chromium the search results are presented by default, but in the background local resource query is performed. If successful, a suggest message appears with redirection link. So if the NXDomain responses are hijacked, such suggestion mechanism will return incorrect results.
Network experiments and other queries for random looking Google domains
Chromium can give some other examples of queries for strange domains. The browser asks about domains in zones:
. Similar requests have been reported by the authors of the previously mentioned “DNS Noise…”. These queries are probably used for some IPv6 metrics, as far as it can be told from their structure.
Different queries contain
domain. They are sent both by Chrome and Chromium. Unfortunately we cannot tell yet for sure what they are used for, however they are probably linked with update of applications.
Content Delivery Networks
DGAs are often used to generate domain names of CDN servers, however it is not a general rule. Suffix of such domains is constant and only prefixes are generated algorithmically. Some examples:
Jest to dość częsty przypadek użycia DGA do celów niezłośliwych, ponieważ wykorzystanie sieci CDN wzrasta.
This is a very common case of usage of DGAs in a non-malicious manner as utilization of CDNs increases.
Other sources of domains which could be misclassified
The last case could be used an example of how detection of DGA botnet domains using lexical features depends on natural language used. For example, domains in Polish and English respectively:
, which mean nearly the same, could be perceived by a classifier (human or machine) as suspicious if it uses natural language other than the one in which domain was created. Another question is if such domains could be generated by DGA. Unfortunately, the example of Matsnu botnet shows that there exist algorithms which create domains very similar to those generated directly by humans.
The examples of domains presented above are not a definitive list. Rather they provide an insight into main categories and applications of legitimate DGA. In this part we have shown their non-malicious utilization and in part two we will present examples when DNS and random looking domains can be considered as manifestations of a threat.