Data Encoding
- Malware needs to hide its intent
- Applies to both its operation (code) but also the data it uses
- Malware authors use anti-disassembly techniques to stop us analysing the code
- And data encoding to stop us from seeing any data
Confusing a flow disassembler
Need to get disassembler to decode the wrong bytes as instructions
But also need to confuse its attempt to follow the flow
Flow disassembler will decode both execution flows
Those two flows refer to the same set of bytes
Need a way to force the disassembler to decode the instructions one way
But not the way the CPU would decode them
Code runs fine, but the disassembler shows us the wrong instructions
Can do this relatively easily by inserting code, which looks conditional to the disassembler
Techniques
- Jumps to the same target
- Unconditional Conditional jumps
- Impossible disassembly
- Function Pointers
- Manipulating the return address from subroutines
Anti-Anti-Disassembly
Any code that can be executed can be disassembled May just have to guide the process a bit
- Explicitly choosing which parts of the program are code and data
- May have to patch the code with
nop
instructions to get it to disassemble correctly
Data Encoding
- Malware needs to hide its intent
- Applies to both its operation but also the data it uses
- Data encoding refers to any form of content modification used for the purpose of hiding intent
- Need to understand the encoding techniques used to understand what malware does
Malware data encoding
Malware will use data encoding to
- Hide configuration information
- Save information to a staging file before stealing it
- Store strings used by the malware
- Disguise itself as a legitimate tool
Mechanisms for Data Encoding
- Malware could (and does) use standard cryptographic algorithms for data encoding
- But malware is just as likely to use simple techniques as complex ones
- Small enough to be used in space-constrained enviroments
- Less obvious than more complex ciphers
- Low overhead, little impact on performance
- Not expecting immunity from being cracked, rather simply looking for an easy way to prevent basic analysis
XOR Cipher
- Common mechanism used by malware authors is to use it
- XOR each byte of the data with a known value
- Convenient to use
- Simple to implement (One instruction)
- Reversible
Brute forcing XOR encoding
- Very easy to brute force crack simple XOR encoding
- Only one of 255 possible values used to encode the data
- Simple take a portion of the encoding data and attempt to decode it using each of the 255 possible keys
- Look at the result and see if something recognisable pops out
- Can also be precomputed if knowing a string could be present
Null-preserving single byte XOR encoding
- Malware authors sometimes use a technique to mitigate this
- Use NULL=preserving single byte encoding scheme
- Rather than XOR every byte, this has two rules
- If byte is zero, or the key then the byte is skipped
- If byte is neither zero or the key, then XOR byte with key
- Still a reversible algorithm
XOR encoding
- Strait-forward to find this code in the disassembler
- Search for
xor
instructions - XOR with a constant value
- XOR a register with another different register
- Look out for small loops containing
xor
instructions
Variations on a theme
- Single byte encoding is relatively weak
- Malware authors have implemented more involved encoding schemes
- Less susceptible to brute-force, but just as simple to implement
- Using addition/subtraction
- Bit rotation
- ROT-n
- Multibyte