## Z3 Solver Simplifying String Decryption

Notes on using Z3 Solver to simplify string deobfuscation

### Executive Summary

Z3 Solver aids in simplifying deobfuscation techinques. This post covers 2 example use cases where a convoluted string decryption routine is broken down and simplified into a single `XOR`

operation. Z3 is used to prove that the extra parts of the decryption routine cancel each other out.

### Case Study

During the course of reverse engineering a binary, I ran across some string decryption routines that I could easily decrypt using python along with a custom IDA plugin; however, I suspected the complete decryption algorithm did not need to be recreated. Instead of just using the decompiled algoritm, I wanted to see if these algorithms could be simplified.

#### Challenge 1

Figure 1 shows challenge 1’s decryption algorithm disassembled along with the encrypted bytes.

Hexray’s decompiler renders the following decompiled code, making it much easier to read.

```
byte_1001202E[result] = (~byte_1001202E[result] & 0xE)
+ byte_1001202E[result]
+ (byte_1001202E[result] ^ 0xE)
- (byte_1001202E[result] | 0xE);
```

Replacing `byte_1001202E[result]`

with `x`

, the algorithm is now represented as seen in code snippet 2.

```
( ~x & 0xe ) + x + ( x ^ 0xe ) - ( x | 0xe )
```

One of the clues that suggests that the algorithm can simplified is that the algorithm contains a mix of `or`

and `and`

operations combined with the same constant. Basically I’m looking for things that suggest one part is cancelling out another part.

#### Solving

One method to figure out how to simplify the equation is to break it up into chucks to see which parts are equal to each other.

First set `x`

equal to a value like so: `x = 0x4b`

and then get the result of each formula inside the parentheses.

`hex( ( ~x ) & 0xe )`

==`0x4`

`hex( x ^ 0xe )`

==`0x45`

`hex( x | 0xe )`

==`0x4f`

Nothing stands out yet. What if we expand the first formula to include the `+ x`

so `(( ~x ) & 0xe ) + x`

:

`hex((( ~x ) & 0xe ) + x)`

==`0x4f`

The value returned here matches the value returned from `( x | 0xe )`

.

Rearranging the original function to read as

```
( x ^ 0xe ) + ((( ~x ) & 0xe ) + x) - ( x | 0xe )
```

says that

`((( ~x ) & 0xe ) + x)`

minus `( x | 0xe )`

is `0`

.

Thus the original deobufscation formula is simply the XOR operation:

```
( x ^ 0xe )
```

##### Proving

Z3 Solver can be used to prove (or disprove) our working theory that for all values of `x`

, `((( ~x ) & 0xe ) + x)`

is equal to `( x | 0xe )`

. Z3 is a “satisfiability modulo theories (SMT) solver by Microsoft” which lends itself well to these types of applications.

The Z3 python script used is:

```
from z3 import *
x, c = BitVec("x", 8), BitVec("c", 8)
s = Solver()
s.add( (( ~x ) & c ) + x != (x | c) )
s.check()
```

This is not a course in Z3 (if interested check out Tim’s Software Deobfuscation Techniques course), but we’ll cover what the script is doing below.

**Steps:**

- After importing the
`z3`

library we set 2 variables:`x`

and`c`

.`x`

represents the`x`

in the original formula and`c`

(for constant) represents the`0xe`

value (really any single byte constant).- Both are set to an 8 bit vector as all of the operations are done using a single byte.

- Solver
`s`

is initalized - Add the condition to the solver
`(( ~x ) & c ) + x != (x | c)`

)`!=`

is used as we are asking Z3 if there are any`x`

and`c`

values combinations where the formula is not the same. If we used`==`

Z3 would stop after the first instance where the two values are equal, but that does not mean that there aren’t values where they are not equal, so use the negative`!=`

to check for equivalancy.

`s.check()`

will process the criteria and either return:`sat`

meaning it was**able**to find a set of values where the condition is`True`

`unsat`

meaning it is**unable**to find a set of values where the condition is`True`

Running the code renders `unsat`

implying that for all combinations of `x`

and `c`

the two forumlas are equal to each other.

#### IDA String Decryption

Armed with the knowledge that the decryption routine is simply `x ^ 0e`

, we can decrypt the string in IDA.

### Challenge 2

Figures 3 and 4 show the decompiled code and encrypted bytes for challenge 2.

The formula can be represented like this; again `x`

is the byte being decrypted:

```
(x & 0xCF) + (x ^ 0x30) + (~x | 0x30) + 1
```

#### Solving

As in challenge 1, break down the formula to see if any of the results overlap. For these `x`

is set to `0x63`

.

`hex(x & 0xCF)`

==`0x43`

`hex(x ^ 0x30)`

==`0x53`

`hex(~x | 0x30)`

==`-0x44`

`hex(~x | 0x30) + 1`

==`-0x43`

1 and 4 cancel each other out. Using this information, let’s again use Z3 to prove that for all values of `x`

when `hex(x & 0xCF)`

is added to `hex(~x | 0x30) + 1`

, they cancel each other out.

#### Proving

Z3 can be used to test our assumption that `(x & 0xCF)`

plus `(~x | 0x30) + 1`

is `0`

.

```
from z3 import *
x = BitVec("x", 8)
s = Solver()
s.add( (( ~x | 0x30 ) + 1) + (x & 0xcf) != 0)
s.check()
```

Z3 proves our theory that these two parts of the original formula cancel each other out. We were unable to set them `!=`

to each other due to the fact the 2 values are added together (unlike the prior example where the two numbers were subtracted), so we add the 2 results together to make sure they are `!=`

to `0`

.

##### A Step Further

Is there a relationship between the `0x30`

and the `0xcf`

?

Looking at the bit representation for each byte, notice that bits with `0`

in one have `1`

at that postion in the other.

`f"{0xcf:08b}"`

==`'11001111'`

`f"{0x30:08b}"`

==`'00110000'`

If we were to `XOR`

or `add`

them together they will return `0xff`

.

`f"{0x30^0xcf:08b}"`

==`'11111111'`

`f"{0x30+0xcf:08b}"`

==`'11111111'`

Using this knowledge, let’s update the Z3 Solver to see if by adding either of these conditions we can replace the hard coded constants with BitVec variables and still get the same result.

```
from z3 import *
x, c1, c2 = BitVec("x", 8), BitVec("c1", 8), BitVec("c2", 8)
s = Solver()
s.add( (( ~x | c1 ) + 1) + (x & c2) != 0, 0xff == c1 + c2 )
s.check()
```

Additional conditions are passed to the solver using the `add`

method. More than one can be speficied at a time.

Figure 5 shows, with the extra condition of `0xff == c1 + c2`

, that the two formulas still cancel each other out.

```
from z3 import *
x, c1, c2 = BitVec("x", 8), BitVec("c1", 8), BitVec("c2", 8)
s = Solver()
s.add( (( ~x | c1 ) + 1) + (x & c2) != 0, 0xff == c1 ^ c2 )
s.check()
```

Figure 6 shows, with the extra condition of `0xff == c1 ^ c2`

, that the two formulas also cancel each other out.

Whether the two constants need to add up to `0xff`

or the `XOR`

of the two need to equal to `0xff`

, I am unsure; however, this does suggest that it can’t be any 2 arbitrary constants.

Now that we’ve proven this, we can say for any formula that makes uses of these conditions, we can safely remove these parts from the decryption algorithm. In this case we are left with: `(x ^ 0x30)`

#### IDA String Decryption

The below video shows the string being decrypted using just `x ^ 0x30`

.

### Extra Observation

Most of the times I’ve seen C strings (even when encrypted) used, they are followed by a null byte. Looking at figures 1 and 4 you’ll notice the final byte is the null byte as it matches the XOR condition used in each respective challenge.

### Outlook

Obfuscation appears to be increasing across the malware landscape. One common obfusactor used is o-llvm which has been forked many times (1.1k at the time of this posting) on github. Z3 Solver can be used to help generate more complex decryption functions which add superfluous logic to slow down the malware analyst. I suspect Z3 along with o-llvm or a derivative were used here when obfuscating this binary. Recognizing and overcoming challenges such as these are important for a reverse engineer; however, going to this length may not be required.