Z3 Solver Simplifying String Decryption
Notes on using Z3 Solver to simplify string deobfuscation
Executive Summary
Z3 Solver aids in simplifying deobfuscation techinques. This post covers 2 example use cases where a convoluted string decryption routine is broken down and simplified into a single XOR
operation. Z3 is used to prove that the extra parts of the decryption routine cancel each other out.
Case Study
During the course of reverse engineering a binary, I ran across some string decryption routines that I could easily decrypt using python along with a custom IDA plugin; however, I suspected the complete decryption algorithm did not need to be recreated. Instead of just using the decompiled algoritm, I wanted to see if these algorithms could be simplified.
Challenge 1
Figure 1 shows challenge 1’s decryption algorithm disassembled along with the encrypted bytes.
Hexray’s decompiler renders the following decompiled code, making it much easier to read.
byte_1001202E[result] = (~byte_1001202E[result] & 0xE)
+ byte_1001202E[result]
+ (byte_1001202E[result] ^ 0xE)
- (byte_1001202E[result] | 0xE);
Replacing byte_1001202E[result]
with x
, the algorithm is now represented as seen in code snippet 2.
( ~x & 0xe ) + x + ( x ^ 0xe ) - ( x | 0xe )
One of the clues that suggests that the algorithm can simplified is that the algorithm contains a mix of or
and and
operations combined with the same constant. Basically I’m looking for things that suggest one part is cancelling out another part.
Solving
One method to figure out how to simplify the equation is to break it up into chucks to see which parts are equal to each other.
First set x
equal to a value like so: x = 0x4b
and then get the result of each formula inside the parentheses.
hex( ( ~x ) & 0xe )
==0x4
hex( x ^ 0xe )
==0x45
hex( x | 0xe )
==0x4f
Nothing stands out yet. What if we expand the first formula to include the + x
so (( ~x ) & 0xe ) + x
:
hex((( ~x ) & 0xe ) + x)
==0x4f
The value returned here matches the value returned from ( x | 0xe )
.
Rearranging the original function to read as
( x ^ 0xe ) + ((( ~x ) & 0xe ) + x) - ( x | 0xe )
says that
((( ~x ) & 0xe ) + x)
minus ( x | 0xe )
is 0
.
Thus the original deobufscation formula is simply the XOR operation:
( x ^ 0xe )
Proving
Z3 Solver can be used to prove (or disprove) our working theory that for all values of x
, ((( ~x ) & 0xe ) + x)
is equal to ( x | 0xe )
. Z3 is a “satisfiability modulo theories (SMT) solver by Microsoft” which lends itself well to these types of applications.
The Z3 python script used is:
from z3 import *
x, c = BitVec("x", 8), BitVec("c", 8)
s = Solver()
s.add( (( ~x ) & c ) + x != (x | c) )
s.check()
This is not a course in Z3 (if interested check out Tim’s Software Deobfuscation Techniques course), but we’ll cover what the script is doing below.
Steps:
- After importing the
z3
library we set 2 variables:x
andc
.x
represents thex
in the original formula andc
(for constant) represents the0xe
value (really any single byte constant).- Both are set to an 8 bit vector as all of the operations are done using a single byte.
- Solver
s
is initalized - Add the condition to the solver
(( ~x ) & c ) + x != (x | c)
)!=
is used as we are asking Z3 if there are anyx
andc
values combinations where the formula is not the same. If we used==
Z3 would stop after the first instance where the two values are equal, but that does not mean that there aren’t values where they are not equal, so use the negative!=
to check for equivalancy.
s.check()
will process the criteria and either return:sat
meaning it was able to find a set of values where the condition isTrue
unsat
meaning it is unable to find a set of values where the condition isTrue
Running the code renders unsat
implying that for all combinations of x
and c
the two forumlas are equal to each other.
IDA String Decryption
Armed with the knowledge that the decryption routine is simply x ^ 0e
, we can decrypt the string in IDA.
Challenge 2
Figures 3 and 4 show the decompiled code and encrypted bytes for challenge 2.
The formula can be represented like this; again x
is the byte being decrypted:
(x & 0xCF) + (x ^ 0x30) + (~x | 0x30) + 1
Solving
As in challenge 1, break down the formula to see if any of the results overlap. For these x
is set to 0x63
.
hex(x & 0xCF)
==0x43
hex(x ^ 0x30)
==0x53
hex(~x | 0x30)
==-0x44
hex(~x | 0x30) + 1
==-0x43
1 and 4 cancel each other out. Using this information, let’s again use Z3 to prove that for all values of x
when hex(x & 0xCF)
is added to hex(~x | 0x30) + 1
, they cancel each other out.
Proving
Z3 can be used to test our assumption that (x & 0xCF)
plus (~x | 0x30) + 1
is 0
.
from z3 import *
x = BitVec("x", 8)
s = Solver()
s.add( (( ~x | 0x30 ) + 1) + (x & 0xcf) != 0)
s.check()
Z3 proves our theory that these two parts of the original formula cancel each other out. We were unable to set them !=
to each other due to the fact the 2 values are added together (unlike the prior example where the two numbers were subtracted), so we add the 2 results together to make sure they are !=
to 0
.
A Step Further
Is there a relationship between the 0x30
and the 0xcf
?
Looking at the bit representation for each byte, notice that bits with 0
in one have 1
at that postion in the other.
f"{0xcf:08b}"
=='11001111'
f"{0x30:08b}"
=='00110000'
If we were to XOR
or add
them together they will return 0xff
.
f"{0x30^0xcf:08b}"
=='11111111'
f"{0x30+0xcf:08b}"
=='11111111'
Using this knowledge, let’s update the Z3 Solver to see if by adding either of these conditions we can replace the hard coded constants with BitVec variables and still get the same result.
from z3 import *
x, c1, c2 = BitVec("x", 8), BitVec("c1", 8), BitVec("c2", 8)
s = Solver()
s.add( (( ~x | c1 ) + 1) + (x & c2) != 0, 0xff == c1 + c2 )
s.check()
Additional conditions are passed to the solver using the add
method. More than one can be speficied at a time.
Figure 5 shows, with the extra condition of 0xff == c1 + c2
, that the two formulas still cancel each other out.
from z3 import *
x, c1, c2 = BitVec("x", 8), BitVec("c1", 8), BitVec("c2", 8)
s = Solver()
s.add( (( ~x | c1 ) + 1) + (x & c2) != 0, 0xff == c1 ^ c2 )
s.check()
Figure 6 shows, with the extra condition of 0xff == c1 ^ c2
, that the two formulas also cancel each other out.
Whether the two constants need to add up to 0xff
or the XOR
of the two need to equal to 0xff
, I am unsure; however, this does suggest that it can’t be any 2 arbitrary constants.
Now that we’ve proven this, we can say for any formula that makes uses of these conditions, we can safely remove these parts from the decryption algorithm. In this case we are left with: (x ^ 0x30)
IDA String Decryption
The below video shows the string being decrypted using just x ^ 0x30
.
Extra Observation
Most of the times I’ve seen C strings (even when encrypted) used, they are followed by a null byte. Looking at figures 1 and 4 you’ll notice the final byte is the null byte as it matches the XOR condition used in each respective challenge.
Outlook
Obfuscation appears to be increasing across the malware landscape. One common obfusactor used is o-llvm which has been forked many times (1.1k at the time of this posting) on github. Z3 Solver can be used to help generate more complex decryption functions which add superfluous logic to slow down the malware analyst. I suspect Z3 along with o-llvm or a derivative were used here when obfuscating this binary. Recognizing and overcoming challenges such as these are important for a reverse engineer; however, going to this length may not be required.