Skip to main content

Data Encoding

  • Malware needs to hide its intent
  • Applies to both its operation (code) but also the data it uses
  • Malware authors use anti-disassembly techniques to stop us analysing the code
  • And data encoding to stop us from seeing any data

Confusing a flow disassembler

  • Need to get disassembler to decode the wrong bytes as instructions

  • But also need to confuse its attempt to follow the flow

  • Flow disassembler will decode both execution flows

  • Those two flows refer to the same set of bytes

  • Need a way to force the disassembler to decode the instructions one way

  • But not the way the CPU would decode them

  • Code runs fine, but the disassembler shows us the wrong instructions

  • Can do this relatively easily by inserting code, which looks conditional to the disassembler

Techniques

  • Jumps to the same target
  • Unconditional Conditional jumps
  • Impossible disassembly
  • Function Pointers
  • Manipulating the return address from subroutines

Anti-Anti-Disassembly

Any code that can be executed can be disassembled May just have to guide the process a bit

  • Explicitly choosing which parts of the program are code and data
  • May have to patch the code with nop instructions to get it to disassemble correctly

Data Encoding

  • Malware needs to hide its intent
  • Applies to both its operation but also the data it uses
  • Data encoding refers to any form of content modification used for the purpose of hiding intent
  • Need to understand the encoding techniques used to understand what malware does

Malware data encoding

Malware will use data encoding to

  • Hide configuration information
  • Save information to a staging file before stealing it
  • Store strings used by the malware
  • Disguise itself as a legitimate tool

Mechanisms for Data Encoding

  • Malware could (and does) use standard cryptographic algorithms for data encoding
  • But malware is just as likely to use simple techniques as complex ones
    • Small enough to be used in space-constrained enviroments
    • Less obvious than more complex ciphers
    • Low overhead, little impact on performance
  • Not expecting immunity from being cracked, rather simply looking for an easy way to prevent basic analysis

XOR Cipher

  • Common mechanism used by malware authors is to use it
  • XOR each byte of the data with a known value
  • Convenient to use
    • Simple to implement (One instruction)
    • Reversible

Brute forcing XOR encoding

  • Very easy to brute force crack simple XOR encoding
  • Only one of 255 possible values used to encode the data
  • Simple take a portion of the encoding data and attempt to decode it using each of the 255 possible keys
  • Look at the result and see if something recognisable pops out
  • Can also be precomputed if knowing a string could be present

Null-preserving single byte XOR encoding

  • Malware authors sometimes use a technique to mitigate this
  • Use NULL=preserving single byte encoding scheme
  • Rather than XOR every byte, this has two rules
    • If byte is zero, or the key then the byte is skipped
    • If byte is neither zero or the key, then XOR byte with key
  • Still a reversible algorithm

XOR encoding

  • Strait-forward to find this code in the disassembler
  • Search for xor instructions
  • XOR with a constant value
  • XOR a register with another different register
  • Look out for small loops containing xor instructions

Variations on a theme

  • Single byte encoding is relatively weak
  • Malware authors have implemented more involved encoding schemes
  • Less susceptible to brute-force, but just as simple to implement
    • Using addition/subtraction
    • Bit rotation
    • ROT-n
    • Multibyte