Breaking Kryptonite's Obfuscation: A Static Analysis Approach Relying on Symbolic Execution
Introduction
Kryptonite was a proof-of-concept I built to obfuscate codes at the LLVM intermediate representation level. The idea was to use semantic-preserving transformations in order to not break the original program. One of the main idea was for example to build a home-made 32 bits adder to replace the add LLVM instruction. Instead of having a single asm instruction generated at the end of the pipeline, you will end up with a ton of assembly codes doing only an addition. If you never read my article, and you are interested in it here it is: Obfuscation of steel: meet my Kryptonite.
In this post I wanted to show you how we can manage to break that obfuscation with symbolic execution. We are going to write a really tiny symbolic execution engine with IDAPy, and we will use Z3Py to simplify all our equations. Note that a friend of mine @elvanderb used a similar approach (more generic though) to simplify some parts of the crackme ; but he didn’t wanted to publish it, so here is my blog post about it!
The target
In this blogpost we are first going to work on the LLVM code emitted by llvm-cpp-frontend-home-made-32bits-adder.cpp. Long story short, the code uses the LLVM frontend API to emit a home made 32 bits adder in the LLVM intermediate language. You can then feed the output directly to clang to generate a real executable binary for your platform, I chose to work only on the x86 platform here. I’ve also uploaded the binary here: adder.
So if you open the generated binary in IDA, you will see an interminable routine that only does an addition. At first glance, it really is kind of scary:
every instructions seems to be important, there is no junk codes
it seems that only binary operations are used: addition, left shift, right shift, xor, etc.
it’s also a two thousands instructions routine
The idea in this post is to write a very basic symbolic execution engine in order to see what kind of result will hold the EAX register at the end of the routine. Hopefully, we will obtain something highly simplified and more readable that this bunch of assembly codes!
The symbolic execution engine approach
But in fact that piece of code makes it really easy for us to write a symbolic execution engine. Here are the main reasons:
there is no branches, no loops, perfect.
the instruction aren’t playing with the EFLAGS register.
the instruction only used 32 bits registers (not 16 bits, or 8 bits).
the number of unique instruction is really small: mov, shr, shl, xor, and, xor, add.
the instructions used are easy to emulate.
Understand that here, we are really in a specific case, the engine wouldn’t be that easy to implement to cover the most used x86 instructions ; but we are lucky, we won’t need that!
The engine is in fact a pseudo-emulator that propagates the different actions done by the asm instructions. Here is how our engine works:
Each time a symbolic variable is found, you instantiate a Z3 BitVector and you keep it somewhere. A symbolic variable is basically a variable that the attacker can control. For example, in our case, we will have two symbolic variables: the two arguments passed to the function. We will see later an easy heuristic to find “automatically” the symbolic variables in our case.
When you have an instruction, you emulate it and you update the CPU state of the engine. If it involves an equation, you update your set of equations.
You do that until the end of the routine.
Of course, when the engine has been successfully executed, you may want to ask it some questions like “what does hold the EAX register at the end of the routine?”. You want to have exactly all the operations needed to compute EAX. In our case, we hope to obtain “symbolic_variable1 + symbolic_variable2”.
Here is a little example to sum up what we just said:
1234567891011
moveax,[arg1]; at this moment we have our first symbolic variable; we push it in our equations listmovedx,[arg2]; same thing hereshreax,2; EAX=sym1 >> 2addeax,1; EAX=(sym1 >> 2) + 1shleax,3; EAX=((sym1 >> 2) + 1) << 1andeax,2; EAX=(((sym1 >> 2) + 1) << 1) & 2incedx; EDX=sym2 + 1xoredx,eax; EDX=(sym2 + 1) ^ ((((sym1 >> 2) + 1) << 1) & 2)moveax,edx; EAX=(sym2 + 1) ^ ((((sym1 >> 2) + 1) << 1) & 2)
So at the end, you can ask the engine to give you the final state of EAX for example and it should give you something like:
1
EAX=(sym2 + 1) ^ ((((sym1 >> 2) + 1) << 1) & 2)
With that equation you are free to use Z3Py to either simplify it or to try to find how you can have a specific value in EAX controlling only the symbolic variables:
As you can imagine, that kind of tool is very valuable/handy when you do reverse-engineering tasks or bug-hunting. Unfortunately, our PoC won’t be enough accurate/generic/complete to be used in “normal” cases, but never mind.
Let’s code
To implement our little PoC we will use only IDAPython and Z3Py.
The disassembler
The first thing we have to do is to use IDA’s API in order to have some inspection information about assembly instructions. The idea is just to have the mnemonic, the source and the destination operands easily ; here is the class I’ve designed toward that purpose:
Disassembler class
123456789101112131415161718192021222324252627
classDisassembler(object):'''A simple class to decode easily instruction in IDA'''def__init__(self,start,end):self.start=startself.end=endself.eip=startdef_decode_instr(self):'''Returns mnemonic, dst, src'''mnem=GetMnem(self.eip)x=[]foriinrange(2):ty=GetOpType(self.eip,i)# cstif5<=ty<=7:x.append(GetOperandValue(self.eip,i))else:x.append(GetOpnd(self.eip,i))return[mnem]+xdefget_next_instruction(self):'''This is a convenient generator, you can iterator through each instructions easily'''whileself.eip!=self.end:yieldself._decode_instr()self.eip+=ItemSize(self.eip)
The symbolic execution engine
There are several important parts in our engine:
the part which “emulates” the assembly instruction.
the part which stores the different equations used through the routine. It is a simple Python dictionary: the key is a unique identifier, and the value is the equation
the CPU state. We also use a dictionary for that purpose: the key will be the register names, and the value will be what the register holds at that specific moment. Note we will only store the unique identifier of the equation. In fact, our design is really similar to Jonathan’s one in “Binary analysis: Concolic execution with Pin and z3”, so please refer you to his cool pictures if it’s not really clear :P.
the memory state ; in that dictionary we store memory references. Remember, if we find a non-initialized access to a memory area we instantiate a symbolic variable. That is our heuristic to find the symbolic variables automatically.
defprove(f):'''Taken from http://rise4fun.com/Z3Py/tutorialcontent/guide#h26'''s=Solver()s.add(Not(f))ifs.check()==unsat:returnTruereturnFalseclassSymbolicExecutionEngine(object):'''The symbolic execution engine is the class that will handle the symbolic execution. It will keep a track of the different equations encountered, and the CPU context at each point of the program. The symbolic variables have to be found by the user (or using data-taing). This is not the purpose of this class. We are lucky, we only need to handle those operations & encodings: . mov: . mov reg32, reg32 . mov reg32, [mem] . mov [mem], reg32 . shr: . shr reg32, cst . shl: . shl reg32, cst . and: . and reg32, cst . and reg32, reg32 . xor: . xor reg32, cst . or: . or reg32, reg32 . add: . add reg32, reg32 We also don't care about: . EFLAGS . branches . smaller registers (16/8 bits) Long story short: it's perfect ; that environment makes really easy to play with symbolic execution.'''def__init__(self,start,end):# This is the CPU context at each time# The value of the registers are index in the equations dictionnaryself.ctx={'eax':None,'ebx':None,'ecx':None,'edx':None,'esi':None,'edi':None,'ebp':None,'esp':None,'eip':None}# The address where the symbolic execution will startself.start=start# The address where the symbolic execution will stopself.end=end# Our disassemblerself.disass=Disassembler(start,end)# This is the memory that can be used by the instructions to save temporary values/resultsself.mem={}# Each equation must have a unique idself.idx=0# The symbolic variables will be stored thereself.sym_variables=[]# Each equation will be stored hereself.equations={}def_check_if_reg32(self,r):'''XXX: make a decorator?'''returnr.lower()inself.ctxdef_push_equation(self,e):self.equations[self.idx]=eself.idx+=1return(self.idx-1)defset_reg_with_equation(self,r,e):ifself._check_if_reg32(r)==False:returnself.ctx[r]=self._push_equation(e)defget_reg_equation(self,r):ifself._check_if_reg32(r)==False:returnreturnself.equations[self.ctx[r]]defrun(self):'''Run from start address to end address the engine'''formnemonic,dst,srcinself.disass.get_next_instruction():ifmnemonic=='mov':# mov reg32, reg32ifsrcinself.ctxanddstinself.ctx:self.ctx[dst]=self.ctx[src]# mov reg32, [mem]elif(src.find('var_')!=-1orsrc.find('arg')!=-1)anddstinself.ctx:ifsrcnotinself.mem:# A non-initialized location is trying to be read, we got a symbolic variable!sym=BitVec('arg%d'%len(self.sym_variables),32)self.sym_variables.append(sym)print'Trying to read a non-initialized area, we got a new symbolic variable: %s'%symself.mem[src]=self._push_equation(sym)self.ctx[dst]=self.mem[src]# mov [mem], reg32elifdst.find('var_')!=-1andsrcinself.ctx:ifdstnotinself.mem:self.mem[dst]=Noneself.mem[dst]=self.ctx[src]else:raiseException('This encoding of "mov" is not handled.')elifmnemonic=='shr':# shr reg32, cstifdstinself.ctxandtype(src)==int:self.set_reg_with_equation(dst,LShR(self.get_reg_equation(dst),src))else:raiseException('This encoding of "shr" is not handled.')elifmnemonic=='shl':# shl reg32, cstifdstinself.ctxandtype(src)==int:self.set_reg_with_equation(dst,self.get_reg_equation(dst)<<src)else:raiseException('This encoding of "shl" is not handled.')elifmnemonic=='and':x=None# and reg32, cstiftype(src)==int:x=src# and reg32, reg32elifsrcinself.ctx:x=self.get_reg_equation(src)else:raiseException('This encoding of "and" is not handled.')self.set_reg_with_equation(dst,self.get_reg_equation(dst)&x)elifmnemonic=='xor':# xor reg32, cstifdstinself.ctxandtype(src)==int:self.set_reg_with_equation(dst,self.get_reg_equation(dst)^src)else:raiseException('This encoding of "xor" is not handled.')elifmnemonic=='or':# or reg32, reg32ifdstinself.ctxandsrcinself.ctx:self.set_reg_with_equation(dst,self.get_reg_equation(dst)|self.get_reg_equation(src))else:raiseException('This encoding of "or" is not handled.')elifmnemonic=='add':# add reg32, reg32ifdstinself.ctxandsrcinself.ctx:self.set_reg_with_equation(dst,self.get_reg_equation(dst)+self.get_reg_equation(src))else:raiseException('This encoding of "add" is not handled.')else:printmnemonic,dst,srcraiseException('This instruction is not handled.')defget_reg_equation_simplified(self,reg):eq=self.get_reg_equation(reg)eq=simplify(eq)returneq
Testing
OK, we just have to instantiate the engine giving him the start/end address of the routine and to ask him to give us the final equation holded in EAX.
main
123456789101112131415
defmain():'''Here we will try to attack the semantic-preserving obfuscations I talked about in "Obfuscation of steel: meet my Kryptonite." : http://0vercl0k.tuxfamily.org/bl0g/?p=260. The idea is to defeat those obfuscations using a tiny symbolic execution engine.'''sym=SymbolicExecutionEngine(0x804845A,0x0804A17C)print'Launching the engine..'sym.run()print'Done, retrieving the equation in EAX, and simplifying..'eax=sym.get_reg_equation_simplified('eax')print'EAX=%r'%eaxreturn1if__name__=='__main__':main()
There was two possible explanations for this problem:
my code is wrong, and it generates equations not simplify-able.
my code is right, and Z3Py’s simplify method has a hard time to simplify it.
To know what was the right answer, I used Z3Py’s prove function in order to know if the equation was equivalent to a simple addition:
main
123456789101112131415
defmain():'''Here we will try to attack the semantic-preserving obfuscations I talked about in "Obfuscation of steel: meet my Kryptonite." : http://0vercl0k.tuxfamily.org/bl0g/?p=260. The idea is to defeat those obfuscations using a tiny symbolic execution engine.'''sym=SymbolicExecutionEngine(0x804845A,0x0804A17C)print'Launching the engine..'sym.run()print'Done, retrieving the equation in EAX, and simplifying..'eax=sym.get_reg_equation_simplified('eax')printprove(eax==Sum(sym.sym_variables))return1if__name__=='__main__':main()
Fortunately for us, it printed True ; so our code is correct. But it also means, the simplify function, as is at least, isn’t able to simplify that bunch of equations involving bit-vector arithmetics. I still haven’t found a clean way to make Z3Py simplify my big equation, so if someone knows how I can do that please contact me. I’ve also exported the complete equation, and uploaded it here ; you are free to give it a try like this.
The ugly trick I came up with is just to use Z3Py’s prove function, to try to prove that the equation is in fact an addition and if this is the case it returns the simplified equation. Again, if someone manages to simplify the previous equation without that type of trick I’m really interested!
nasty trick
123456789101112131415161718
def_simplify_additions(self,eq):'''The idea in this function is to help Z3 to simplify our big bitvec-arithmetic expression. It's simple, in eq we have a big expression with two symbolic variables (arg0 & arg1) and a lot of bitvec arithmetic. Somehow, the simplify function is not clever enough to reduce the equation. The idea here is to use the prove function in order to see if we can simplify an equation by an addition of the symbolic variables.'''# The two expressions are equivalent ; we got a simplification!ifprove(Sum(self.sym_variables)==eq):returnSum(self.sym_variables)returneqdefget_reg_equation_simplified(self,reg):eq=self.get_reg_equation(reg)eq=simplify(self._simplify_additions(eq))returneq
And now if you relaunch the script you will get:
12345
Launching the engine..
Trying to read a non-initialized area, we got a new symbolic variable: arg0
Trying to read a non-initialized area, we got a new symbolic variable: arg1
Done, retrieving the equation in EAX, and simplifying..
EAX=arg0 + arg1
We just successfully simplified two thousands of assembly into a simple addition, wonderful!
Symbolic execution VS Kryptonite
OK, now we have a working engine able to break a small program (~two thousands instructions), let’s see if we can do the same with a kryptonized-binary. Let’s take a simple addition like in the previous parts:
At this moment we end up with that binary: kryptonite-add. The target routine for our study starts at 0x804823C and ends at 0x08072284 ; roughly more than 40 thousands assembly instructions and kind of big right?
Here is our final IDAPython script after some minor adjustments (added one or two more instructions):
classEquationId(object):def__init__(self,id_):self.id=id_def__repr__(self):return'EID:%d'%self.idclassDisassembler(object):'''A simple class to decode easily instruction in IDA'''def__init__(self,start,end):self.start=startself.end=endself.eip=startdef_decode_instr(self):'''Returns mnemonic, dst, src'''mnem=GetMnem(self.eip)x=[]foriinrange(2):ty=GetOpType(self.eip,i)# cstif5<=ty<=7:x.append(GetOperandValue(self.eip,i))else:x.append(GetOpnd(self.eip,i))return[mnem]+xdefget_next_instruction(self):'''This is a convenient generator, you can iterator through each instructions easily'''whileself.eip!=self.end:yieldself._decode_instr()self.eip+=ItemSize(self.eip)classSymbolicExecutionEngine(object):'''The symbolic execution engine is the class that will handle the symbolic execution. It will keep a track of the different equations encountered, and the CPU context at each point of the program. The symbolic variables have to be found by the user (or using data-taing). This is not the purpose of this class. We are lucky, we only need to handle those operations & encodings: . mov: . mov reg32, reg32 . mov reg32, [mem] . mov [mem], reg32 . mov reg32, cst . shr: . shr reg32, cst . shl: . shl reg32, cst . and: . and reg32, cst . and reg32, reg32 . xor: . xor reg32, cst . or: . or reg32, reg32 . add: . add reg32, reg32 . add reg32, cst We also don't care about: . EFLAGS . branches . smaller registers (16/8 bits) Long story short: it's perfect ; that environment makes really easy to play with symbolic execution.'''def__init__(self,start,end):# This is the CPU context at each time# The value of the registers are index in the equations dictionnaryself.ctx={'eax':None,'ebx':None,'ecx':None,'edx':None,'esi':None,'edi':None,'ebp':None,'esp':None,'eip':None}# The address where the symbolic execution will startself.start=start# The address where the symbolic execution will stopself.end=end# Our disassemblerself.disass=Disassembler(start,end)# This is the memory that can be used by the instructions to save temporary values/resultsself.mem={}# Each equation must have a unique idself.idx=0# The symbolic variables will be stored thereself.sym_variables=[]# Each equation will be stored hereself.equations={}# Number of instructions emulatedself.ninstrs=0def_check_if_reg32(self,r):'''XXX: make a decorator?'''returnr.lower()inself.ctxdef_push_equation(self,e):idx=EquationId(self.idx)self.equations[idx]=eself.idx+=1returnidxdefset_reg_with_equation(self,r,e):ifself._check_if_reg32(r)==False:returnself.ctx[r]=self._push_equation(e)defget_reg_equation(self,r):ifself._check_if_reg32(r)==False:returnifisinstance(self.ctx[r],EquationId):returnself.equations[self.ctx[r]]else:returnself.ctx[r]defrun(self):'''Run from start address to end address the engine'''formnemonic,dst,srcinself.disass.get_next_instruction():if(self.ninstrs%5000)==0andself.ninstrs>0:print'%d instructions, %d equations so far...'%(self.ninstrs,len(self.equations))ifmnemonic=='mov':# mov reg32, imm32ifdstinself.ctxandisinstance(src,(int,long)):self.ctx[dst]=src# mov reg32, reg32elifsrcinself.ctxanddstinself.ctx:self.ctx[dst]=self.ctx[src]# mov reg32, [mem]elif(src.find('var_')!=-1orsrc.find('arg')!=-1)anddstinself.ctx:ifsrcnotinself.mem:# A non-initialized location is trying to be read, we got a symbolic variable!sym=BitVec('arg%d'%len(self.sym_variables),32)self.sym_variables.append(sym)print'Trying to read a non-initialized area, we got a new symbolic variable: %s'%symself.mem[src]=self._push_equation(sym)self.ctx[dst]=self.mem[src]# mov [mem], reg32elifdst.find('var_')!=-1andsrcinself.ctx:self.mem[dst]=self.ctx[src]else:raiseException('This encoding of "mov" is not handled.')elifmnemonic=='shr':# shr reg32, cstifdstinself.ctxandisinstance(src,(int,long)):self.set_reg_with_equation(dst,self.get_reg_equation(dst)>>src)else:raiseException('This encoding of "shr" is not handled.')elifmnemonic=='shl':# shl reg32, cstifdstinself.ctxandisinstance(src,(int,long)):self.set_reg_with_equation(dst,self.get_reg_equation(dst)<<src)else:raiseException('This encoding of "shl" is not handled.')elifmnemonic=='and':# and reg32, cstifisinstance(src,(int,long)):x=src# and reg32, reg32elifsrcinself.ctx:x=self.get_reg_equation(src)else:raiseException('This encoding of "and" is not handled.')self.set_reg_with_equation(dst,self.get_reg_equation(dst)&x)elifmnemonic=='xor':# xor reg32, cstifdstinself.ctxandisinstance(src,(int,long)):ifself.ctx[dst]notinself.equations:self.ctx[dst]^=srcelse:self.set_reg_with_equation(dst,self.get_reg_equation(dst)^src)else:raiseException('This encoding of "xor" is not handled.')elifmnemonic=='or':# or reg32, reg32ifdstinself.ctxandsrcinself.ctx:self.set_reg_with_equation(dst,self.get_reg_equation(dst)|self.get_reg_equation(src))else:raiseException('This encoding of "or" is not handled.')elifmnemonic=='add':# add reg32, reg32ifdstinself.ctxandsrcinself.ctx:self.set_reg_with_equation(dst,self.get_reg_equation(dst)+self.get_reg_equation(src))# add reg32, cstelifdstinself.ctxandisinstance(src,(int,long)):self.set_reg_with_equation(dst,self.get_reg_equation(dst)+src)else:raiseException('This encoding of "add" is not handled.')else:printmnemonic,dst,srcraiseException('This instruction is not handled.')self.ninstrs+=1def_simplify_additions(self,eq):'''The idea in this function is to help Z3 to simplify our big bitvec-arithmetic expression. It's simple, in eq we have a big expression with two symbolic variables (arg0 & arg1) and a lot of bitvec arithmetic. Somehow, the simplify function is not clever enough to reduce the equation. The idea here is to use the prove function in order to see if we can simplify an equation by an addition of the symbolic variables.'''# The two expressions are equivalent ; we got a simplification!ifprove_(Sum(self.sym_variables)==eq):returnSum(self.sym_variables)returneqdefget_reg_equation_simplified(self,reg):eq=self.get_reg_equation(reg)eq=simplify(self._simplify_additions(eq))returneqdefmain():'''Here we will try to attack the semantic-preserving obfuscations I talked about in "Obfuscation of steel: meet my Kryptonite." : http://0vercl0k.tuxfamily.org/bl0g/?p=260. The idea is to defeat those obfuscations using a tiny symbolic execution engine.'''# sym = SymbolicExecutionEngine(0x804845A, 0x0804A17C) # for simple addersym=SymbolicExecutionEngine(0x804823C,0x08072284)# adder kryptonizedprint'Launching the engine..'sym.run()print'Done. %d equations built, %d assembly lines emulated, %d virtual memory cells used'%(len(sym.equations),sym.ninstrs,len(sym.mem))print'CPU state at the end:'printsym.ctxprint'Retrieving and simplifying the EAX register..'eax=sym.get_reg_equation_simplified('eax')print'EAX=%r'%eaxreturn1if__name__=='__main__':main()
And here is the final output:
12345678910111213141516
Launching the engine..
Trying to read a non-initialized area, we got a new symbolic variable: arg0
Trying to read a non-initialized area, we got a new symbolic variable: arg1
5000 instructions, 2263 equations so far...
10000 instructions, 4832 equations so far...
15000 instructions, 7228 equations so far...
20000 instructions, 9766 equations so far...
25000 instructions, 12212 equations so far...
30000 instructions, 14762 equations so far...
35000 instructions, 17255 equations so far...
40000 instructions, 19801 equations so far...
Done. 19857 equations built, 40130 assembly lines emulated, 5970 virtual memory cells used
CPU state at the end:
{'eax': EID:19856, 'ebp': None, 'eip': None, 'esp': None, 'edx': EID:19825, 'edi': EID:19796, 'ebx': EID:19797, 'esi': EID:19823, 'ecx': EID:19856}
Retrieving and simplifying the EAX register..
EAX=arg0 + arg1
Conclusion
I hope you did enjoy this little introduction to symbolic execution, and how it can be very valuable to remove some semantic-preserving obfuscations. We also have seen that this PoC is not really elaborate: it doesn’t handle loops or any branches, doesn’t care about EFLAGS, etc ; but it was enough to break our two examples. I hope you also enjoyed the examples used to showcase our tiny symbolic execution engine.
If you want to go further with symbolic execution, here is a list of nice articles: