Z3 Solver Simplifying String Decryption
Notes on using Z3 Solver to simplify string deobfuscation
Z3 Solver aids in simplifying deobfuscation techinques. This post covers 2 example use cases where a convoluted string decryption routine is broken down and simplified into a single
XOR operation. Z3 is used to prove that the extra parts of the decryption routine cancel each other out.
During the course of reverse engineering a binary, I ran across some string decryption routines that I could easily decrypt using python along with a custom IDA plugin; however, I suspected the complete decryption algorithm did not need to be recreated. Instead of just using the decompiled algoritm, I wanted to see if these algorithms could be simplified.
Figure 1 shows challenge 1’s decryption algorithm disassembled along with the encrypted bytes.
Hexray’s decompiler renders the following decompiled code, making it much easier to read.
byte_1001202E[result] = (~byte_1001202E[result] & 0xE) + byte_1001202E[result] + (byte_1001202E[result] ^ 0xE) - (byte_1001202E[result] | 0xE);
x, the algorithm is now represented as seen in code snippet 2.
( ~x & 0xe ) + x + ( x ^ 0xe ) - ( x | 0xe )
One of the clues that suggests that the algorithm can simplified is that the algorithm contains a mix of
and operations combined with the same constant. Basically I’m looking for things that suggest one part is cancelling out another part.
One method to figure out how to simplify the equation is to break it up into chucks to see which parts are equal to each other.
x equal to a value like so:
x = 0x4b and then get the result of each formula inside the parentheses.
hex( ( ~x ) & 0xe )==
hex( x ^ 0xe )==
hex( x | 0xe )==
Nothing stands out yet. What if we expand the first formula to include the
+ x so
(( ~x ) & 0xe ) + x:
hex((( ~x ) & 0xe ) + x)==
The value returned here matches the value returned from
( x | 0xe ).
Rearranging the original function to read as
( x ^ 0xe ) + ((( ~x ) & 0xe ) + x) - ( x | 0xe )
((( ~x ) & 0xe ) + x) minus
( x | 0xe ) is
Thus the original deobufscation formula is simply the XOR operation:
( x ^ 0xe )
Z3 Solver can be used to prove (or disprove) our working theory that for all values of
((( ~x ) & 0xe ) + x) is equal to
( x | 0xe ). Z3 is a “satisfiability modulo theories (SMT) solver by Microsoft” which lends itself well to these types of applications.
The Z3 python script used is:
from z3 import * x, c = BitVec("x", 8), BitVec("c", 8) s = Solver() s.add( (( ~x ) & c ) + x != (x | c) ) s.check()
This is not a course in Z3 (if interested check out Tim’s Software Deobfuscation Techniques course), but we’ll cover what the script is doing below.
- After importing the
z3library we set 2 variables:
xin the original formula and
c(for constant) represents the
0xevalue (really any single byte constant).
- Both are set to an 8 bit vector as all of the operations are done using a single byte.
- Add the condition to the solver
(( ~x ) & c ) + x != (x | c))
!=is used as we are asking Z3 if there are any
cvalues combinations where the formula is not the same. If we used
==Z3 would stop after the first instance where the two values are equal, but that does not mean that there aren’t values where they are not equal, so use the negative
!=to check for equivalancy.
s.check()will process the criteria and either return:
satmeaning it was able to find a set of values where the condition is
unsatmeaning it is unable to find a set of values where the condition is
Running the code renders
unsat implying that for all combinations of
c the two forumlas are equal to each other.
IDA String Decryption
Armed with the knowledge that the decryption routine is simply
x ^ 0e, we can decrypt the string in IDA.
Figures 3 and 4 show the decompiled code and encrypted bytes for challenge 2.
The formula can be represented like this; again
x is the byte being decrypted:
(x & 0xCF) + (x ^ 0x30) + (~x | 0x30) + 1
As in challenge 1, break down the formula to see if any of the results overlap. For these
x is set to
hex(x & 0xCF)==
hex(x ^ 0x30)==
hex(~x | 0x30)==
hex(~x | 0x30) + 1==
1 and 4 cancel each other out. Using this information, let’s again use Z3 to prove that for all values of
hex(x & 0xCF) is added to
hex(~x | 0x30) + 1, they cancel each other out.
Z3 can be used to test our assumption that
(x & 0xCF) plus
(~x | 0x30) + 1 is
from z3 import * x = BitVec("x", 8) s = Solver() s.add( (( ~x | 0x30 ) + 1) + (x & 0xcf) != 0) s.check()
Z3 proves our theory that these two parts of the original formula cancel each other out. We were unable to set them
!= to each other due to the fact the 2 values are added together (unlike the prior example where the two numbers were subtracted), so we add the 2 results together to make sure they are
A Step Further
Is there a relationship between the
0x30 and the
Looking at the bit representation for each byte, notice that bits with
0 in one have
1 at that postion in the other.
If we were to
add them together they will return
Using this knowledge, let’s update the Z3 Solver to see if by adding either of these conditions we can replace the hard coded constants with BitVec variables and still get the same result.
from z3 import * x, c1, c2 = BitVec("x", 8), BitVec("c1", 8), BitVec("c2", 8) s = Solver() s.add( (( ~x | c1 ) + 1) + (x & c2) != 0, 0xff == c1 + c2 ) s.check()
Additional conditions are passed to the solver using the
add method. More than one can be speficied at a time.
Figure 5 shows, with the extra condition of
0xff == c1 + c2, that the two formulas still cancel each other out.
from z3 import * x, c1, c2 = BitVec("x", 8), BitVec("c1", 8), BitVec("c2", 8) s = Solver() s.add( (( ~x | c1 ) + 1) + (x & c2) != 0, 0xff == c1 ^ c2 ) s.check()
Figure 6 shows, with the extra condition of
0xff == c1 ^ c2, that the two formulas also cancel each other out.
Whether the two constants need to add up to
0xff or the
XOR of the two need to equal to
0xff, I am unsure; however, this does suggest that it can’t be any 2 arbitrary constants.
Now that we’ve proven this, we can say for any formula that makes uses of these conditions, we can safely remove these parts from the decryption algorithm. In this case we are left with:
(x ^ 0x30)
IDA String Decryption
The below video shows the string being decrypted using just
x ^ 0x30.
Most of the times I’ve seen C strings (even when encrypted) used, they are followed by a null byte. Looking at figures 1 and 4 you’ll notice the final byte is the null byte as it matches the XOR condition used in each respective challenge.
Obfuscation appears to be increasing across the malware landscape. One common obfusactor used is o-llvm which has been forked many times (1.1k at the time of this posting) on github. Z3 Solver can be used to help generate more complex decryption functions which add superfluous logic to slow down the malware analyst. I suspect Z3 along with o-llvm or a derivative were used here when obfuscating this binary. Recognizing and overcoming challenges such as these are important for a reverse engineer; however, going to this length may not be required.