How non-existent domain names can unveil DGA botnets

Date of publication: 01/10/2015, piotrb

dga_icon

Domain Generation Algorithms are used in botnets to make it harder to block connections to Command & Control servers and to make it difficult to takeover botnet infrastructure. The main objective of these algorithms is to generate a big number of different domain names which usually look random, like

<span class="text">pkjdgjwzcr.pl</span>

. Only some of them are registered by a botmaster, however compromised hosts tend to query all of them until they find a working domain. As a result bots can receive a big number of non-existent domain name responses (in short: NXDomain). In this entry we will show how such behavior can be utilized to detect DGA botnets using examples of different detection methods.

Non-existent domain names in DGA botnets

While bot tries to connect to a C&C server it sends a DNS request for the appropriate domain name. If it is not registered by the botmaster, the DNS server responds with information that such domain does not exist. In the DGA botnets a list of possible domains is generated, which is then used by bot. Only few of them have an IP address of C&C server assigned. Usually bot checks all domains in the list until it finds one that works or the list ends. Scheme of such communication is presented in figure below:

nx_gen_pl

High level of NXDomain responses can be considered an anomaly and detected by monitoring systems. However sometimes such errors are generated by benign applications, for example by Chrome browser (you can read more here). Because of that detection systems should be adapted to differentiate between benign and malicious messages, for example by incorporating a whitelist mechanism.

NXDomain as an anomaly

DGA botnet detection process can be based exclusively on the analysis of NXDomain responses, like it is used in the system proposed by S. Krishnan et al. It utilizes NXDomain messages to perform sequential hypothesis testing and thus to compute reputation score for every host that receives such message. If the response comes from known domain zone (i.e. previously queried by that host) then reputation score is updated in the direction of the benign class, otherwise score is updated to indicate that host acts as a bot. The basic principle of detection is simple: if a host queries a lot of previously unseen non-existent domains it is suspicious.

Failure graphs

Some botnet detection systems use NXDomain responses to create DNS failure graphs. Such graphs represent relationships between domain names and hosts that queried them. Vertices of such graphs represent hosts and domain names. Edges connect vertices only when a DNS failure between them did happen. This approach was used by N. Jiang et al. By applying decomposition algorithm to failure graphs they create coherent co-clusters, which are further processed and analyzed to provide information about the type of observed malware. An example of a bi-mesh subgraph of Conficker.A botnet found by the authors is presented in the figure below (it was obtained from the original paper).

 


jiang_pl

Narrowing the analysis

Botnet detection systems can prefilter hosts intended for analysis on basis of NXDomain responses. In such way only those which query non-existent domains are processed. For example system of H. Guerid et al. utilize such mechanism to choose hosts which are suspicious in perspective of the number of received NXDomain responses. After the prefiltering step hosts are grouped by similarity of non-existent domains they queried. Then if some successfully queried domain name is shared among peers in a particular group it is considered as suspicious. Benign and popular domains are filtered out by checking whether hosts from other groups also query them. Eventually a list of active C&C domains is acquired.

One of the features

NXDomain response analysis don’t have to be the only part of a detection system. R. Sharifnya et al. presented a system similar to Krishnan et al (it has been already discussed above) in which hosts are also given a reputation score. However the authors do not rely only on NXdomain messages, but also check lexical features of queried domain names and correlate behavior with other hosts in the network. Using that means the system can be tuned accordingly to conditions in a specific network and use both group activities and behavior of single hosts for detection.

Conclusion

As we have shown with above examples (please take note, that we showed only some of the detection methods and it is not a full survey) NXDomain responses can be efficiently used in DGA botnet detection systems. Usage of them should be considered as one of many detection features, not the only one. The main reason is that not only botnets can produce NXDomain errors, as numerous legitimate sources can provide additional noise and raise false alarms. Furthermore botnets can undertake some evasion techniques as well, for example by reducing number of queried domains or performing searches in variable time intervals.