Data Encoding

Confusing a flow disassembler

Need to get disassembler to decode the wrong bytes as instructions
But also need to confuse its attempt to follow the flow
Flow disassembler will decode both execution flows
Those two flows refer to the same set of bytes
Need a way to force the disassembler to decode the instructions one way
But not the way the CPU would decode them
Code runs fine, but the disassembler shows us the wrong instructions
Can do this relatively easily by inserting code, which looks conditional to the disassembler

Any code that can be executed can be disassembled May just have to guide the process a bit

Explicitly choosing which parts of the program are code and data
May have to patch the code with nop instructions to get it to disassemble correctly

Malware needs to hide its intent
Applies to both its operation but also the data it uses
Data encoding refers to any form of content modification used for the purpose of hiding intent
Need to understand the encoding techniques used to understand what malware does

Malware will use data encoding to

Malware could (and does) use standard cryptographic algorithms for data encoding
But malware is just as likely to use simple techniques as complex ones
- Small enough to be used in space-constrained enviroments
- Less obvious than more complex ciphers
- Low overhead, little impact on performance
Not expecting immunity from being cracked, rather simply looking for an easy way to prevent basic analysis

Very easy to brute force crack simple XOR encoding
Only one of 255 possible values used to encode the data
Simple take a portion of the encoding data and attempt to decode it using each of the 255 possible keys
Look at the result and see if something recognisable pops out
Can also be precomputed if knowing a string could be present

Malware authors sometimes use a technique to mitigate this
Use NULL=preserving single byte encoding scheme
Rather than XOR every byte, this has two rules
- If byte is zero, or the key then the byte is skipped
- If byte is neither zero or the key, then XOR byte with key
Still a reversible algorithm

Single byte encoding is relatively weak
Malware authors have implemented more involved encoding schemes
Less susceptible to brute-force, but just as simple to implement
- Using addition/subtraction
- Bit rotation
- ROT-n
- Multibyte