Estimating size of the botnets in Poland

Date of publication: 19/05/2014, CERT Polska

computer_wormAnnual CERT Polska report will soon be available on our website for download. This year we decided not only to include statistical data (which will be moved to a separate section), but also describe trends and events that were important according to us and were observed in the last year. While you wait for the report, you can read a short fragment of the report below. It contains a description of the method we used to estimate the botnet size and results of this estimations. Some of the referenced material has been removed to improve readability, but it will be available in the final version of the report.

Estimating botnets size in Poland

Estimating the size of botnets is a very difficult issue. The cited numbers are often staggering, based on unclear rules and they do not accurately indicate the scale of the problem. This year, based on our own data and those received from external sources, we attempted to measure the real size of the botnets. We are going to present the problem and the methods at the end of this blog entry. The results of our estimates are described below.

The number of botnets, presented in the report, has nothing in common with the figures shown in the Eurostat report which states that 30% of Polish users had a contact with an infected computer. In this case we assume that users, responding to the questions in the survey, relied mainly on the information from their antivirus software.

It leads to many problems with the interpretation. Firstly, antivirus software can block infections. According to Microsoft about 20% of computers in Poland have been exposed to malware, although infection was blocked by the antivirus program. However, blocking by the antivirus software does not mean that the user would be infected if he did not have an antivirus program. The file which contained the malware could be in one of email messages that he still would not open.

The second problem comes from a misunderstanding regarding the definition of malware – average users are not able to differentiate between a non-standard behavior of software and a real malware. Some of them can even identify a phishing attack as malware. What is more, not only users have problems with a definition of malicious software. Antivirus engines may also detect benign files and treat them as an unwanted software.

Of course, there is also another side of this issue — we do not know what we don’t know, i.e. we do not know about the threats that have not been detected by the antivirus systems. According to data received by CERT Polska and the abovementioned methodology, we think that in Poland there are about 169,900 infected computers per day, which represents about 1.5% of all the computers in Polish households. The number of computers in Polish households is estimated to be around 11 million, with details available in the final report. It gives a certain estimation of the infection rates.

Our data about botnets comes from our sinkhole and honeypot systems, P2P botnet crawler, and not from antivirus systems. As a result, they are usually accurate and contain very few false positives.

Due to the nature of collected data and developed methodology that allows us to estimate that in Poland there are not more than 300,000 infected computers active per day. Moreover, it is worth noting that there is a growing number of small botnets consisting of several thousand machines. Cybercriminals are trying to specialize in attacking users who are the most attractive for them and not simply infect all of the computers.

Our results are close to values presented in the Microsoft report about Polish infection rates. According to it, the infection rate for Poland is between 0.56% and 0.78%, while for the world it varies between 0.53% and 0.63%, depending on the considered quarter. This may be underestimated because it does not include people who do not use antivirus solutions, and in theory their computers are more vulnerable to infections.

Table below present most prevalent malware families in Poland.

Rank Name IPs Percentage
1 Conficker 45 521 26,79%
2 Sality 24 080 14,17%
3 ZeroAccess 19 025 11,20%
4 Virut 15 063 8,87%
5 ZeuS (incl. Citadel and alike) 12 193 7,18%
Others 54 018 31,79%

 

How to accurately estimate the botnet size?

Different institutions use various methods for estimating the size of botnets. Any organization taking over a botnet has a reason to claim that a given botnet is very big, or even the biggest. It leads to the use of many different methods of counting. We will try to choose the easiest one to employ that estimates the size of botnet in a correct way. In order to understand the problem it is necessary to present the obstacles connected with estimating the size of botnet. We will consider few possible methods and describe how this observation determinates resulting estimation of the size. We do not take into consideration the method of collecting botnet data, whether it is sinkholing or passive observation. It of course also affects the way of estimating the size of botnet, but most institutions choose the method that is both possible to implement and accurate.

Unique IP addresses

Common way of estimating the size of the botnet is counting unique IP addresses. However, this approach indicates that a given IP address represents one and always the same computer. This assumption is not true. According to the ENISA report, in many cases Internet providers use dynamic IP addresses. These addresses change from time to time when a user restarts his modem or access router. Hence, given user every once in a while gets another IP address. One bot can be counted multiple times when we are only counting the unique IP addresses. In one case, a bot changed its IP address 694 times within 10 days. To avoid this effect, unique IP addresses can be counted per day, assuming that most of the computers change IP address on average once a day.

On the other hand, many networks use the technique of network address translation (NAT). It allows multiple network devices to share one public IP address. It is used by companies that do not need to have a separate public IP address for each workstation. Home users undertake similar activity and install a router in their network to provide access to the Internet for several devices (e,g. by using WiFi). In this case several bots can be located on one IP address, but only one will be counted. There is no easy way to compensate for this. Only bot IDs can be used to do that.

Bot IDs

The second method of defining the size of botnet is the use of bot ID. In some cases such as Citadel, described previously in our report about the Citadel plitfi botnet, this ID is rather unique. Unfortunately, in other cases, like Virut, the ID, which was meant to be unique, proved to be completely useless. It was generated from the Volume Serial Number that could be changed by a user, and even when this is not the case, it still fails to identify the machine sufficiently.

What is the botnet size?

We suggest, along with the researchers from Johns Hopkins University and the authors of the ENISA report mentioned previously, to understand “the size of botnet” not as a defined term but as a term that depends on the methodology used in a particular case. Microsoft report separates two concepts: cleaned computers per mile, CCM (or infection rate) which is the number of computers where malicious software was detected, and encounter rate, that is the number of computers on which antivirus solutions were able to block the infections.

The differences in estimating the size of botnets with the use of various methods are significant. As an example we will analyze three different cases. The first one estimates the size of Torpig botnet on the basis of the ten-day observation.

    • 1,247,642 unique IP addresses,
    • 182,800 unique bot IDs,
    • 179,866 unique IP addresses per day (on average),
    • about 200,000 unique addresses per day (maximum),
    • about 125,000 unique bot IDs (total maximum).

    Assuming that the generated IDs are unique, or at least unique enough, we can conclude that if we estimated the size of Torpig botnet on the basis of the unique IP addresses we would overestimate it over sixfold. On the other hand, despite that the number of unique IPs per day differs by more than 36% in relation to the number of unique bots per day, it estimates the total size of the botnet quite well.

    The second case is an attempt to estimate the size of one of the instances of the Citadel plitfi botnet based on the 25-day observation:

    • 164,323 unique IP addresses,
    • 11,730 unique IDs.

The observation period was even longer, so the number of unique IP addresses differs even more from the actual size of the botnet. The number of unique IPs connecting with the server was 14 times larger than the real number of bots.

The third case relates to the connections with our sinkhole server from various instances of botnets based on the Citadel malware. The results of the measurements carried out within 10 randomly selected days are presented in the table below.

Date IPs Bot IDs
18.01.2014 3323 2288
19.01.2014 3671 2376
20.01.2014 3963 2414
21.01.2014 4459 2341
04.02.2014 3238 2268
05.02.2014 3217 2230
06.02.2014 3486 2290
07.02.2014 3306 2180
08.02.2014 3092 2094
09.02.2014 3311 2174
Łącznie 19768 3429

According to the research by the Arbor Networks, in some cases as a result of NAT there are up to 100 different infected computers behind a single IP address. According to the data collected from our sinkhole server we observed up to four different infected computers behind a single IP address. On average there were 1.02 bots for a single IP address. One computer changed the IP address 858 times within 4 days, while on average one computer had 5.90 different IP addresses through the whole observation period. The total number of unique IP addresses is almost 6 times larger than the number of unique bot IDs.

Although the number of unique IP addresses per day is larger than the number of unique IDs per day, the size of the botnet is estimated correctly. It should be noted that for some malware bots do not generate unique IDs. Therefore, in our opinion, maximum number of unique IP addresses per day is a measure that approximates the real size of botnet quite well.