Report an incident
Report an incident

Karton Gems 3: Malware extraction with malduck

Table of contents

  1. Getting Started
  2. Your first karton
  3. Malware extraction with malduck

Introduction

Today we'll continue topics started in the first part of the tutorial. We'll learn about malduck, what can it do and how to write your own modules. Later we'll also show how to integrate it with Karton by using with karton-config-extractor.

Mal-🦆

Malduck is a utility library designed for malware researchers. The most important features include:

  • Extraction engine (modular extraction framework for config extraction from files/dumps)
  • Cryptography (AES, Blowfish, Camelie, ChaCha20, Serpent and many others)
  • Compression algorithms (aPLib, gzip, LZNT1 (RtlDecompressBuffer))
  • Memory model objects (work on memory dumps, PE/ELF, raw files and IDA dumps with the same code)
  • Fixed integer types (like Uint64) and bitwise utilities
  • String operations (chunks, padding, packing/unpacking etc.)
  • Hashing algorithms (CRC32, MD5, SHA1, SHA256)

In this tutorial, we'll focus on the first one - the extraction engine. But we'll also showcase other library features in the code snippets.

To get a better overview of the library, check out code examples in the README or the official documentation.

Now, install malduck in a temporary virtual environment:

$ cd $(mktemp -d)
$ python3 -m venv venv
$ source ./venv/bin/activate
$ pip install malduck
$ malduck --version
malduck, version 4.1.0

Your first extractor module

First, download a malware sample for tests. You can download one from our Github (don't worry, it's a memory dump and won't harm anyone (it may trigger the AV, though, so make sure you're ready)):

wget https://github.com/CERT-Polska/training-mwdb/raw/main/citadelmalware.bin

This is a dump of a (pretty old) citadel sample. Now let's try to write a malduck module for it.

First, we need a Yara rule. For example, this one:

rule citadel
{
    meta:
        author = "mak"
        module = "citadel"
    strings:
        $briankerbs = "Coded by BRIAN KREBS for personal use only. I love my job & wife."
        $cit_aes_xor = {81 30 [4] 0F B6 50 03 0F B6 78 02 81 70 04 [4] 81 70 08 [4] 81 70 0C [4] C1 E2 08 0B D7 }
        $cit_salt = { 8A D1 80 E2 07 C0 E9 03 47 83 FF 04 }
        $cit_login = { 30 [1-2] 8A 8? [4] 32  }
        $cit_getpes = { 68 [2] 00 00 8D ( 84 24 | 85) [4] 50 8D ( 85 ?? ?? ?? ?? | 44 24 ?? ) 50 E8 [4] B8 [2] 00 00 50 68 }
        $cit_base_off = { 5? 8D 85 [4] E8 [4] 6A 20 68 [4] 8D [2] 50 E8 [4] 8D 85 [4] 50 }
    condition:
        3 of them
}

Now check that it works:

$ yara -rs citadel.yar citadelmalware.bin
citadel citadelmalware.bin
0x33795:$cit_aes_xor: 81 30 3E 4A BB 01 0F B6 50 03 0F B6 78 02 81 70 04 84 1B B2 98 81 70 08 12 2B B5 EF 81 70 0C B1 ...
0x32e19:$cit_salt: 8A D1 80 E2 07 C0 E9 03 47 83 FF 04
0x32f2f:$cit_login: 30 04 3E 8A 89 B8 5F 40 00 32
0x16593:$cit_base_off: 57 8D 85 20 F9 FF FF E8 12 96 00 00 6A 20 68 B8 5F 40 00 8D 45 EC 50 E8 88 C5 01 00 8D 85 77 FA ...
0x1fbe7:$cit_base_off: 57 8D 85 DC FA FF FF E8 BE FF FF FF 6A 20 68 B8 5F 40 00 8D 45 F0 50 E8 34 2F 01 00 8D 85 33 FC ...

It looks like it does, and multiple symbols matched.

We usually try to match the most exciting or specific segments of code. For example, cit_aes_xor is related to AES encryption code, cit_salt is a code that reads the salt, etc. Those code fragments were picked because they're stable and don't often change between different compilations, but also because we can extract useful information with them.

Enter malduck modules code. Your role as a programmer is to provide callbacks for interesting symbols and extract additional information with them:

import logging
from malduck.extractor import Extractor

log = logging.getLogger()


class Citadel(Extractor):  # @Extractor
    family = "citadel"
    yara_rules = "citadel",  # mind the comma (this is a tuple, not a string)

    # Callback for "briankerbs" symbol (by default function name is used).
    @Extractor.extractor("briankerbs")
    def citadel_found(self, p, addr):
        log.info('[+] `Coded by Brian Krebs` str @ %X' % addr)
        return {'family': 'citadel'}

    @Extractor.extractor
    def cit_salt(self, p, addr):  # @Callbacks
        salt = p.uint32v(addr - 8)  # @Procmem
        log.info('[+] Found salt @ %X - %x' % (addr, salt))
        return {'salt': salt}

What's going on here? This is a pretty simple module with a single callback, cit_salt (name matters).

Extractor

We have just created an "extractor". An extractor is responsible for extracting configs from dumps of a given family (in this case, citadel). It needs a corresponding .yar file with one or many rules.

Callbacks

Extractors usually have multiple callbacks. Every callback is called for every occurrence of a matching symbol in Yara rules. 1

In this case, the cit_salt callback will be called with addr=0x32e19 (the address of the symbol in the dump - see above).

Callbacks are responsible for extracting simple pieces of information, and they return them as Python dict objects. In the "real world", there are usually multiple callbacks, and their result is combined. For example, if one callback returns:

{ "salt": "xyz123" }

And the other one returns:

{ "key": "ilovemalware13" }

Then the final config is:

{
    "salt": "xyz123",
    "key": "ilovemalware13"
}

Callbacks can be very simple, like citadel_found function. It will only be called when the symbol of interest is found in the binary.

Beyond uint32v

Of course just reading uint32 is not overly impressive. Let's look at a bit more advanced callback (from a full version of the Citadel extractor):

    @Extractor.extractor
    def cit_login(self, p, addr):
        log.info('[+] Found login_key xor @ %X' % addr)
        hit = p.uint32v(addr + 4)
        if p.is_addr(hit):
            return {'login_key': p.asciiz(hit)}

        hit = p.uint32v(addr + 5)
        if p.is_addr(hit):
            return {'login_key': p.asciiz(hit)}

To understand what's going on here, we need to look at the assembly code. Recall the Yara matches:

0x32f2f:$cit_login: 30 04 3E 8A 89 B8 5F 40 00 32

Let's disassemble it:

$ echo 30043E8A89B85F400032 | xxd -r -ps | ndisasm -b 32 -
00000000  30043E            xor [esi+edi],al
00000003  8A89B85F4000      mov cl,[ecx+0x405fb8]
00000009  32                db 0x32

As we can see, this is just a simple piece of code that moves data around. It's interesting because mov opcode copies a byte from the AES key to the cl register 2. This means that we can use this to get a location of the AES key in the memory - in this case, it's 0x405fb8 (offset from mov's mod/rm operand)'.

So we can get the address of the AES key:

hit = p.uint32v(addr + 4)

Of course, it's not very useful - address may be different in every analysed binary (or even change during every execution). We also need to read the key:

return {'login_key': p.asciiz(hit)}

asciiz is one of many helper methods useful for reading various types of data. As the name suggests, it reads an ASCII string, starting from the hit address, and until a null byte is found.

Malduck ninjutsu

Sometimes you really need to flex your module-writing skills. For example, imagine that the key is xor-red in runtime in assembly code (xor key is not stored in a data segment somewhere). The assembly code changes after every recompilation. This is precisely what happens in Citadel. Let's disassemble cit_aes_xor hit:

$ echo 81303e4abb010fb650030fb67802817004841bb298817008122bb5ef81700cb1bed171c1e2080bd70fb | xxd -r -ps | ndisasm -b32 -
00000000  81303E4ABB01      xor dword [eax],0x1bb4a3e
00000006  0FB65003          movzx edx,byte [eax+0x3]
0000000A  0FB67802          movzx edi,byte [eax+0x2]
0000000E  817004841BB298    xor dword [eax+0x4],0x98b21b84
00000015  817008122BB5EF    xor dword [eax+0x8],0xefb52b12
0000001C  81700CB1BED171    xor dword [eax+0xc],0x71d1beb1
00000023  C1E208            shl edx,byte 0x8
00000026  0BD7              or edx,edi

How do you write a module for it? Well, there are multiple options. But the easiest, and most readable one is to just use disassembler in our favour:

    @Extractor.extractor
    def cit_aes_xor(self, p, addr):
        log.info('[+] Found aes_xor key @ %X' % addr)
        r = []

        for c in p.disasmv(addr, 40):  # disassembly 40 bytes starting from addr
            if len(r) == 4:  # key is always 4 dwords long
                break
            if c.mnem == 'xor':
                r.append(c.op2.value)
        return {'aes_xor': malduck.enhex(b''.join(map(p32, r)))}

We disassemble the code until we find four xor opcodes and concatenate the operants into a final aes_xor config key.

Procmem

Last but not least, we should talk about process memory objects.

The files we work on are various kinds of memory maps (like PE files, ELF files, or memory dumps). We usually care more about their in-memory layout than their on-disk layout. For example, we often ask "read 5 bytes from address 0x400100", but not "what is the 117th byte of the file".

Process Memory objects are the abstraction that makes it possible. They load various types of files to memory, and implement functions like .readv (read a chunk of memory from a given virtual address).

Right now, the supported formats are: - PE files - memory dumps - ELF files - IDA interactive session (IDAMem objects) - memory dumps in Cuckoo 2.x format

But it's not hard to add a new format when necessary.

Try it out!

Now it's time to try our module. Copy&paste the yara and python files, or download them from our Github:

wget https://github.com/CERT-Polska/training-mwdb/raw/main/modules.7z
7z x modules.7z

modules/ directory should look like this:

$ find
.
./modules
./modules/citadel
./modules/citadel/citadel.yar
./modules/citadel/citadel.py
./modules/citadel/__init__.py
./modules/__init__.py

Now, try to run the extractor on a downloaded Citadel sample:

$ malduck extract citadelmalware.bin --modules modules
[+] Ripped 'citadel' from citadelmalware.bin:
{
    "family": "citadel",
    "salt": 4073311727
}

It looks like it worked!

Karton integration with karton-config-extractor

How does it all relate to the Karton framework? Malduck is packaged as karton-config-extractor, and you can easily plug it into your pipeline. See Karton Gems 1 for a longer description of that topic.

Like in Karton Gems 1, you need a karton-playground (docker-compose with a dev environment) running on your local machine:

$ git clone https://github.com/CERT-Polska/karton-playground.git
$ cd karton-playground
$ sudo docker-compose up  # this may take a while

Long story short, just install the config-extractor package and run it:

$ python3 -m venv venv; source ./venv/bin/activate
$ pip install karton-config-extractor
$ karton-config-extractor --modules modules
[2021-05-13 15:27:10,085][INFO] Service karton.config-extractor started
[2021-05-13 15:27:10,098][INFO] Binds changed, old service instances should exit soon.
[2021-05-13 15:27:10,099][INFO] Binding on: {'type': 'sample', 'stage': 'recognized', 'kind': 'runnable', 'platform': 'win32'}
[2021-05-13 15:27:10,100][INFO] Binding on: {'type': 'sample', 'stage': 'recognized', 'kind': 'runnable', 'platform': 'win64'}
[2021-05-13 15:27:10,100][INFO] Binding on: {'type': 'sample', 'stage': 'recognized', 'kind': 'runnable', 'platform': 'linux'}
[2021-05-13 15:27:10,101][INFO] Binding on: {'type': 'analysis', 'kind': 'drakrun-prod'}
[2021-05-13 15:27:10,101][INFO] Binding on: {'type': 'analysis', 'kind': 'drakrun'}

Now upload an executable file to mwdb, and new logs should appear:

[2021-05-13 15:27:20,940][INFO] Received new task - ee2abafc-271b-441b-b81c-77f264c8e120
[2021-05-13 15:27:20,981][INFO] Processing drakmon OSS analysis, sample: 3a153c52aa82a667091dff9a4b4defb7a6e395c3d0604d7aa18f75ca6a27e77e
[2021-05-13 15:27:24,130][INFO] Merging and reporting extracted configs
[2021-05-13 15:27:24,131][INFO] done analysing, results: {"analysed": 94, "crashed": 0}
[2021-05-13 15:27:24,156][INFO] Task done - ee2abafc-271b-441b-b81c-77f264c8e120

When config is extracted successfully, it's added to mwdb automatically.

What's next

That's it, enough kartoning for today. Porting all your modules to malduck may be a long and exhausting endeavour, but it was worth it for us.

You may use the community to your advantage. There is a small but growing repository with publicly available modules at https://github.com/c3rb3ru5d3d53c/mwcfg-modules. You can use it as a starting point for your modules or get a better feel of malduck. If possible, try to contribute back.

In future instalments of the series, we'll talk a bit about other open-sourced kartons and deployment options.


  1. There is also a special callback called handle_match. It is called once for every matched binary and can return additional config. You can use it for processing that doesn't fit the malduck framework nicely. 

  2. We know this because we reverse-engineered that sample thoroughly. 

Share: