Anti-Disassembly
- Look at the disassembled instructions and start to understand the program
- But malware authors use the same tools as malware analysts
- They use anti-disassembly techniques to stop us analysing the code
Anti-Disassembly
Aims is to slow down the analyst
Make the job of disassembly as hard as possible
Reality is that any code that can be executed can be disassembled
Malware authors aim is to push the skill required up as much as possible
Same techniques also help prevent anti-virus heuristics from analysing the code too
Not talking about trying to camouflage code or data
Will see ways to do that
Rather looking at how code can be written
- So the disassembler misinterprets it
- But the CPU can still execute it without any issue
Assembly
- Assembler is relatively straight-forward
- Take an instruction, convert it into the relevant machine code opcodes
- Move onto the next instruction
- Result is a stream of bytes for the CPU to execute
Disassembly
- Harder problem
- Easy to convert a single instruction, once we know where it starts
- Program is a stream of bytes, one after the other
- x86 instructions can vary in length from 1-15 bytes
- Cannot decode an instruction, until we know where it starts
- Cannot know where an instruction starts, until we decode the previous instruction
Anti-Disassembly
- Key to understanding the techniques
- Understand they are trying to force disassembler to make the wrong decisions about where instructions start
- So disassemble phoney instructions made up of bytes from the middle of real instructions
- Need to understand how a disassembler is implemented
- CPU dependent
Disassembly Techniques
- Take in one or more bytes (depending on the length)
- Convert that to the assembly code that would generate that instruction
- Move onto the next instruction
- Repeat But next instruction might not be correct
Two approach generally used to determine next instruction; linear disassembly or flow disassembly Flow disassembly generally gives better results than linear.
Linear disassembly
Method
- Decode first instruction at start address
- Then know the length (in bytes) of that instruction
- Add length to address of that instruction to give address of next instruction
- Repeat
Thoughts
- Generally works
- Can go wrong even with non-malicious binaries
- Tempting to think that the
.text
section will only contain code - Not necessarily the case, sometimes data is intermingled
- Linear disassembler would treat....
Breaking
- Very easy to break a linear disassembler
Flow Disassembly
- Aims to improve the detection of instructions (and rejection of data)
- Follows the flow of the program rather than linearly processing instructions
Instruction Flow
Break instructions down into groups based on what instruction will execute next
Always execute the next instruction
Can execute either the next instruction of a specified one (
call
,jz
,jnz
)Always execute instructions at a specified address e.g.
jmp
Next instruction unknown, e.g.
ret
For simple instructions works just like linear disassembly