The DGA of CoreBot


These are just unpolished notes. The content likely lacks clarity and structure; and the results might not be adequately verified and/or incomplete.

Recently, IBM’s Security X-Force researchers analysed and reported a new banking trojan called CoreBot. They note that CoreBot features an inactive domain generation algorithm (DGA). The DGA has since been activated as observed by Kleissner & Associates who sinkholed some of the domains.

Since I couldn’t find a description of the DGA elsewhere, in the following my short write-up about the DGA of CoreBot. I looked at this sample provided by @benkow_ and referenced in a Tweet of @Bry_Campbell. Here are the first 10 DGA domains from the malwr report: 	

Edit 2015-09-28: The analysed sample turned out to be a debugging exemplar. I revised the post to highlight the difference.


The DGA is configured with the following routine init_dga_config:

Call to init_dga_config

The meaning of these values are:

  • charset_len: This is the length of the charset array containing ASCII characters used for the DGA. The actual array is initialized later (see below).
  • r: this is the random number, initialized to the hardcoded seed 1DB98930. Other samples of CoreBot use a different hardcoded seeds, see Section Samples in the Wild.
  • len_l: this is the inclusive lower bound on the length of the subdomains of
  • len_u: this is the exclusive upper bound on the length of the subdomains of

The set of characters for the domains is initialized as follows:


This code fills the charset array with “abcdefghijklmnopqrstuvwxy012345678”. Note that “z” and “9” are missing due to an off-by-one error. This bug seems to be widespread among VXers: Necurs, Ramnit, and Ranbyus all have similar errors that lead to missing “z”s. Edit 2015-09-17: Tinba, Geodo/Emotet, and Cryptolocker also have the missing “z” problem, thanks to Daniel Plohmann for pointing that out.


The DGA is time dependent. The time is determined by making an HTTP request to

Request to Google

… and querying the date and time with the WinHTTP function WinHttpQueryHeaders:

Systemtime from Google's Response Header

My sample later overwrites the day with 8. While this could be to reduce the granularity of the DGA from days to months, it is more likely a debugging measure:

Overwriting the day

The next screenshot shows another sample that doesn’t overwrite the day. Notice that the offset nicely line up; the two samples are equal except for the removed “day ← 8” statement.

Without debug

Apart from the year, month, and day (set to 8), there is a fourth value used for seeding. This value is stored as a configuration value

reading the group

In my sample the returned value was NULL, and the group was set to 1. I have yet to see a sample that uses the config value.

The year, month, day (set to 8) and the are then applied to the random number:


The above disassembly boils down to:

r = r + year + ((group << 16) + (month << 8) | day)


The itself is very simple. It generates up to 40 subdomains (configurable with core.dga.domains_count) using the common linear congruential generator with multiplier 1664525 and increment 1013904223:

the dga

The disassembly decompiles to:

r = (1664525*r + 1013904223) & 0xFFFFFFFF
domain_len = len_l + r % (len_u - len_l)
domain = ""
for i in range(domain_len):
    r = ((1664525 * r) + 1013904223) & 0xFFFFFFFF
    domain += charset[r % charset_size]

Python Code

The following Python code generates the domains for any given date. It takes the following arguments:

  • -s, --seed: the seed as a hex string. If none is provided, the script uses 1DBA8930
  • -d, --date: the date for which to generate the domains. If none is provided, then the current date is used. If you like to get the domains for the debug sample, you can use the next option --debug.
  • -t, --debug: overwrite the day with 8 like the debug in this blog post does.
  • -n, --nr: number of domains to generate, default 40.

You can also find the code in my GitHub repository:

import argparse
from datetime import datetime

def init_rand_and_chars(year, month, day, nr_b, r):
    r = (r + year + ((nr_b << 16) + (month << 8) | day)) & 0xFFFFFFFF
    charset = [chr(x) for x in xrange(ord('a'), ord('z'))] +\
            [chr(x) for x in xrange(ord('0'), ord('9'))]
    return charset, r

def generate_domain(charset, r):
    len_l = 0xC
    len_u = 0x18
    r = (1664525*r + 1013904223) & 0xFFFFFFFF
    domain_len = len_l + r % (len_u - len_l)
    domain = ""
    for i in range(domain_len, 0, -1):
        r = ((1664525 * r) + 1013904223) & 0xFFFFFFFF
        domain += charset[r % len(charset)] 
    domain += ""
    return r

if __name__=="__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("-s", "--seed", help="seed", default="1DBA8930")
    parser.add_argument("-d", "--date", help="date for which to generate domains")
    parser.add_argument("-t", "--debug", help="debug DGA (day set to 8)")
    parser.add_argument("-n", "--nr", help="nr of domains to generate", 
        type=int, default=40)
    args = parser.parse_args()
    d = datetime.strptime(, "%Y-%m-%d") if else
    day = 8 if args.debug else

    charset, r = init_rand_and_chars(d.year, d.month, day, 1, 
            int(args.seed, 16)) 
    for _ in range(40):
        r = generate_domain(charset, r)

Samples in the Wild

The sample in this blog post (first entry in the following table) turns out to be a special case: the day is set to 8 for debugging purposes, and the seed is slightly different than the ones of the “productive” samples. All other samples have the same seed.

MD5 seed debug
cb345ee48e811219387ffcd0d76788f2 1DB98930 yes1
cc09ad01ce6785d287724f2f877a91f8 1DBA8930 no
2f46770e63abd90d24031ff88b6a46f5 1DBA8930 no
34f36f4ec445755d6e24203f81e562e8 1DBA8930 no
feea363fb52213c72e5876cc8b5f8831 1DBA8930 no
5c0b4d07949be6ed2035def1e8fcd85e 1DBA8930 no
09da58404d000cad3daea72a5782bb00 1DBA8930 no
c40a5db6c20ba4316edd64d612481c41 2 1DBA8930 unknown 3
67a248a56380865c85f902729b0d9944 1DBA8930 no

1: meaning the day is set to 8. 2: md5 sum of javascript that dropped corebot 3: the sample was submitted September 8th.


The following table summarizes the properties of Corebot’s DGA:

property value
seed magic number and current date
granularity 1 day
domains per seed and day 40
sequence sequential
wait time between domains none
top and second level domain
third level characters lower case letters except ‘z’
third level length 12 to 23 letters


The DGA in this blog post has been implemented by the DGArchive  project.
comments powered by Disqus