Z3 Solver Simplifying String Decryption

Notes on using Z3 Solver to simplify string deobfuscation

January 8, 2023

xorhex

8-Minute Read

Z3 Simplifying String Decryption Blog Header Picture

Executive Summary

Z3 Solver aids in simplifying deobfuscation techinques. This post covers 2 example use cases where a convoluted string decryption routine is broken down and simplified into a single XOR operation. Z3 is used to prove that the extra parts of the decryption routine cancel each other out.

Case Study

During the course of reverse engineering a binary, I ran across some string decryption routines that I could easily decrypt using python along with a custom IDA plugin; however, I suspected the complete decryption algorithm did not need to be recreated. Instead of just using the decompiled algoritm, I wanted to see if these algorithms could be simplified.

Challenge 1

Figure 1 shows challenge 1’s decryption algorithm disassembled along with the encrypted bytes.

Figure 1: Encrypted Bytes

Hexray’s decompiler renders the following decompiled code, making it much easier to read.

   byte_1001202E[result] = (~byte_1001202E[result] & 0xE)
                          + byte_1001202E[result]
                          + (byte_1001202E[result] ^ 0xE)
                          - (byte_1001202E[result] | 0xE);

Code Snippet 1: IDAPro Decompiled Code

Replacing byte_1001202E[result] with x, the algorithm is now represented as seen in code snippet 2.

( ~x & 0xe ) + x + ( x ^ 0xe ) - ( x | 0xe )

Code Snippet 2: Fuction Simplified

One of the clues that suggests that the algorithm can simplified is that the algorithm contains a mix of or and and operations combined with the same constant. Basically I’m looking for things that suggest one part is cancelling out another part.

Solving

One method to figure out how to simplify the equation is to break it up into chucks to see which parts are equal to each other.

First set x equal to a value like so: x = 0x4b and then get the result of each formula inside the parentheses.

hex( ( ~x ) & 0xe ) == 0x4
hex( x ^ 0xe ) == 0x45
hex( x | 0xe ) == 0x4f

Nothing stands out yet. What if we expand the first formula to include the + x so (( ~x ) & 0xe ) + x:

hex((( ~x ) & 0xe ) + x) == 0x4f

The value returned here matches the value returned from ( x | 0xe ).

Rearranging the original function to read as

( x ^ 0xe ) + ((( ~x ) & 0xe ) + x) - ( x | 0xe )

says that

((( ~x ) & 0xe ) + x) minus ( x | 0xe ) is 0.

Thus the original deobufscation formula is simply the XOR operation:

( x ^ 0xe )

Proving

Z3 Solver can be used to prove (or disprove) our working theory that for all values of x, ((( ~x ) & 0xe ) + x) is equal to ( x | 0xe ). Z3 is a “satisfiability modulo theories (SMT) solver by Microsoft” which lends itself well to these types of applications.

The Z3 python script used is:

from z3 import *

x, c = BitVec("x", 8), BitVec("c", 8)

s = Solver()

s.add( (( ~x ) & c ) + x != (x | c) )

s.check()

Code Snippet 3: Z3 Solver Proof

This is not a course in Z3 (if interested check out Tim’s Software Deobfuscation Techniques course), but we’ll cover what the script is doing below.

Steps:

After importing the z3 library we set 2 variables: x and c.
- x represents the x in the original formula and c (for constant) represents the 0xe value (really any single byte constant).
- Both are set to an 8 bit vector as all of the operations are done using a single byte.
Solver s is initalized
Add the condition to the solver (( ~x ) & c ) + x != (x | c))
- != is used as we are asking Z3 if there are any x and c values combinations where the formula is not the same. If we used == Z3 would stop after the first instance where the two values are equal, but that does not mean that there aren’t values where they are not equal, so use the negative != to check for equivalancy.
s.check() will process the criteria and either return:
- sat meaning it was able to find a set of values where the condition is True
- unsat meaning it is unable to find a set of values where the condition is True

Running the code renders unsat implying that for all combinations of x and c the two forumlas are equal to each other.

Figure 1: Result of Z3 Python Script

IDA String Decryption

Armed with the knowledge that the decryption routine is simply x ^ 0e, we can decrypt the string in IDA.

Challenge 2

Figures 3 and 4 show the decompiled code and encrypted bytes for challenge 2.

Figure 3: Decryption Routine

Figure 4: Encrypted Bytes

The formula can be represented like this; again x is the byte being decrypted:

  (x & 0xCF) + (x ^ 0x30) + (~x | 0x30) + 1

Code Snippet 4: Challenge 2 Formula

Solving

As in challenge 1, break down the formula to see if any of the results overlap. For these x is set to 0x63.

hex(x & 0xCF) == 0x43
hex(x ^ 0x30) == 0x53
hex(~x | 0x30) == -0x44
hex(~x | 0x30) + 1 == -0x43

1 and 4 cancel each other out. Using this information, let’s again use Z3 to prove that for all values of x when hex(x & 0xCF) is added to hex(~x | 0x30) + 1, they cancel each other out.

Proving

Z3 can be used to test our assumption that (x & 0xCF) plus (~x | 0x30) + 1 is 0.

from z3 import *

x = BitVec("x", 8)

s = Solver()

s.add( (( ~x | 0x30 ) + 1) + (x & 0xcf) != 0)

s.check()

Code Snippet 5: Challenge 2 Z3 Solver Take 1

Figure 4: Z3 Proof

Z3 proves our theory that these two parts of the original formula cancel each other out. We were unable to set them != to each other due to the fact the 2 values are added together (unlike the prior example where the two numbers were subtracted), so we add the 2 results together to make sure they are != to 0.

A Step Further

Is there a relationship between the 0x30 and the 0xcf?

Looking at the bit representation for each byte, notice that bits with 0 in one have 1 at that postion in the other.

f"{0xcf:08b}" == '11001111'
f"{0x30:08b}" == '00110000'

If we were to XOR or add them together they will return 0xff.

f"{0x30^0xcf:08b}" == '11111111'
f"{0x30+0xcf:08b}" == '11111111'

Using this knowledge, let’s update the Z3 Solver to see if by adding either of these conditions we can replace the hard coded constants with BitVec variables and still get the same result.

from z3 import *

x, c1, c2 = BitVec("x", 8), BitVec("c1", 8), BitVec("c2", 8)

s = Solver()

s.add( (( ~x | c1 ) + 1) + (x & c2) != 0, 0xff == c1 + c2 )

s.check()

Code Snippet 6: Challenge 2 Z3 Solver Take 2 Version 1 : Plus

Additional conditions are passed to the solver using the add method. More than one can be speficied at a time.

Figure 5: Challenge 2 Z3 Solver Take 2 Version 1 : Plus

Figure 5 shows, with the extra condition of 0xff == c1 + c2, that the two formulas still cancel each other out.

from z3 import *

x, c1, c2 = BitVec("x", 8), BitVec("c1", 8), BitVec("c2", 8)

s = Solver()

s.add( (( ~x | c1 ) + 1) + (x & c2) != 0, 0xff == c1 ^ c2 )

s.check()

Code Snippet 7: Challenge 2 Z3 Solver Take 2 Version 2 : XOR

Figure 6: Challenge 2 Z3 Solver Take 2 Version 2 : XOR

Figure 6 shows, with the extra condition of 0xff == c1 ^ c2, that the two formulas also cancel each other out.

Whether the two constants need to add up to 0xff or the XOR of the two need to equal to 0xff, I am unsure; however, this does suggest that it can’t be any 2 arbitrary constants.

Now that we’ve proven this, we can say for any formula that makes uses of these conditions, we can safely remove these parts from the decryption algorithm. In this case we are left with: (x ^ 0x30)

IDA String Decryption

The below video shows the string being decrypted using just x ^ 0x30.

Extra Observation

Most of the times I’ve seen C strings (even when encrypted) used, they are followed by a null byte. Looking at figures 1 and 4 you’ll notice the final byte is the null byte as it matches the XOR condition used in each respective challenge.

Outlook

Obfuscation appears to be increasing across the malware landscape. One common obfusactor used is o-llvm which has been forked many times (1.1k at the time of this posting) on github. Z3 Solver can be used to help generate more complex decryption functions which add superfluous logic to slow down the malware analyst. I suspect Z3 along with o-llvm or a derivative were used here when obfuscating this binary. Recognizing and overcoming challenges such as these are important for a reverse engineer; however, going to this length may not be required.

xorhex