So far, we’ve mostly targeted return addresses on the stack for our exploits with the occasional manipulation of a function pointer. However, there are many other locations we can target to hi-jack the program’s control flow. In this set of notes, we describe the Global Offset Table a particularly useful structure that contains plenty of interesting code pointers for us to manipulate.
Binaries can be broadly divided into two categories: statically-linked and dynamically-linked. The first kind is self-contained, meaning it does not use external libraries. On the other hand, dynamically-linked binaries depend on system libraries to added at runtime [1]. For instance consider the following C code:
#include <stdio.h>
int main(int argc, char **argv){
printf("Oh, it is on, like a prawn who yawns at dawn.\n");
return 0;
}
After compiling and executing our binary we get a string displayed on our
screen. Nice and easy. However, this dynamically-linked binary calls the
printf
function from the standard C library. We can get the list of
dynamically linked binaries used by our program by running ldd sample
—where
sample
is the binary compiled with the C code shown above. We get the result
linux-vdso.so.1 => (0x00007ffff7ffa000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffff7a0d000)
/lib64/ld-linux-x86-64.so.2 (0x00007ffff7dd7000)
As you can see these are the base addresses of the libraries called. Note that
these addresses will change due to ASLR. You can see by yourself by running
ldd sample
multiple times. How can our simple binary know the addresses of
these libraries if they change every time the program runs?
To find out, first let’s disassemble our binary by running gdb sample
. Lets
also set the syntax to Intel style by running set disassembly-flavor intel
.
Run disas main
and we get
Dump of assembler code for function main:
0x0000000000400526 <+0>: push rbp
0x0000000000400527 <+1>: mov rbp,rsp
0x000000000040052a <+4>: sub rsp,0x10
0x000000000040052e <+8>: mov DWORD PTR [rbp-0x4],edi
0x0000000000400531 <+11>: mov QWORD PTR [rbp-0x10],rsi
0x0000000000400535 <+15>: mov edi,0x4005d4
0x000000000040053a <+20>: call 0x400400 <puts@plt>
0x000000000040053f <+25>: mov eax,0x0
0x0000000000400544 <+30>: leave
0x0000000000400545 <+31>: ret
End of assembler dump.
We see values loaded into registers and whatnot but what we’re really interested in is this line:
0x000000000040053a <+20>: call 0x400400 <puts@plt>
Here is where our external function gets called and how we’re able to obtain
the address to call it. One minor note, the binary uses puts@plt
instead of
printf@plt
because of a compiler/linker optimization. Namely, the compiler
sees that we are using a static string without a format specifier and decides
that printf
is not needed. However, the compiler’s decision does not change
the intended functionality of our binary nor the point we are making in this
post.
Interpreting this line, the program will call puts
in the procedure linkage
table (plt or PLT) at address 0x400400
. When you compile your binary there
are sections called relocations which are left for the linker to fill in at
runtime [5]. The linker determines some value, maybe an address, and places
that value inside the binary at some offset [5]. You could look at the
relocations the compiler leaves behind by running gcc -c sample.c
to compile
and readelf --relocs ./sample.o
which gives us:
Relocation section '.rela.text' at offset 0x208 contains 2 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000010 00050000000a R_X86_64_32 0000000000000000 .rodata + 0
000000000015 000a00000002 R_X86_64_PC32 0000000000000000 puts - 4
Relocation section '.rela.eh_frame' at offset 0x238 contains 1 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000020 000200000002 R_X86_64_PC32 0000000000000000 .text + 0
As you can see there’s a relocation for the puts
function so we need the
address of this function in libc and have the linker place it inside our
binary.
000000000015 000a00000002 R_X86_64_PC32 0000000000000000 puts - 4
Now, the PLT as mentioned above helps find these addresses. But how does it do
that? Well, first lets examine the address 0x400400
, the one referenced when
calling puts
. We run disas 0x400400
and we get
Dump of assembler code for function puts@plt:
0x0000000000400400 <+0>: jmp QWORD PTR [rip+0x200c12] # 0x601018
0x0000000000400406 <+6>: push 0x0
0x000000000040040b <+11>: jmp 0x4003f0
End of assembler dump.
Interesting, so we jumped to the PLT but we then immediately jump somewhere
else, specifically to 0x601018
. This technique of
jumping to a location and then immediately jumping somewhere else based on a
stored value is sometimes called function trampolining.
Actually, we have trampolined to an address stored in the global offset table (got or
GOT). We can see where the global offset table begins and what’s stored at that address
by running x _GLOBAL_OFFSET_TABLE_
which gives us 0x600e28: 0x00000001
. We
can see more of the GOT, including address 0x601018
by running x/128wx _GLOBAL_OFFSET_TABLE_
and we get
0x600e28: 0x00000001 0x00000000 0x00000001 0x00000000
0x600e38: 0x0000000c 0x00000000 0x004003c8 0x00000000
0x600e48: 0x0000000d 0x00000000 0x004005c4 0x00000000
0x600e58: 0x00000019 0x00000000 0x00600e10 0x00000000
0x600e68: 0x0000001b 0x00000000 0x00000008 0x00000000
0x600e78: 0x0000001a 0x00000000 0x00600e18 0x00000000
0x600e88: 0x0000001c 0x00000000 0x00000008 0x00000000
0x600e98: 0x6ffffef5 0x00000000 0x00400298 0x00000000
0x600ea8: 0x00000005 0x00000000 0x00400318 0x00000000
0x600eb8: 0x00000006 0x00000000 0x004002b8 0x00000000
0x600ec8: 0x0000000a 0x00000000 0x0000003d 0x00000000
0x600ed8: 0x0000000b 0x00000000 0x00000018 0x00000000
0x600ee8: 0x00000015 0x00000000 0x00000000 0x00000000
0x600ef8: 0x00000003 0x00000000 0x00601000 0x00000000
0x600f08: 0x00000002 0x00000000 0x00000030 0x00000000
0x600f18: 0x00000014 0x00000000 0x00000007 0x00000000
0x600f28: 0x00000017 0x00000000 0x00400398 0x00000000
0x600f38: 0x00000007 0x00000000 0x00400380 0x00000000
0x600f48: 0x00000008 0x00000000 0x00000018 0x00000000
0x600f58: 0x00000009 0x00000000 0x00000018 0x00000000
0x600f68: 0x6ffffffe 0x00000000 0x00400360 0x00000000
0x600f78: 0x6fffffff 0x00000000 0x00000001 0x00000000
0x600f88: 0x6ffffff0 0x00000000 0x00400356 0x00000000
0x600f98: 0x00000000 0x00000000 0x00000000 0x00000000
0x600fa8: 0x00000000 0x00000000 0x00000000 0x00000000
0x600fb8: 0x00000000 0x00000000 0x00000000 0x00000000
0x600fc8: 0x00000000 0x00000000 0x00000000 0x00000000
0x600fd8: 0x00000000 0x00000000 0x00000000 0x00000000
0x600fe8: 0x00000000 0x00000000 0x00000000 0x00000000
0x600ff8: 0x00000000 0x00000000 0x00600e28 0x00000000
0x601008: 0x00000000 0x00000000 0x00000000 0x00000000
0x601018: 0x00400406 0x00000000 0x00400416 0x00000000
Ideally, that stored address points to where puts
has been loaded into
memory. But it turns out that things are more complicated, namely, the address
in the GOT is not updated until the program tries to call puts
the first
time. This process is called lazy binding.
The first time the program calls puts
, the address in the GOT entry for
puts
actually points to a special bit of code that results in the dynamic linker
loading the library and updating the GOT with the correct addresses. In future
calls to puts
, the jmp
in the PLT will directly go to puts
(since the GOT
entry has been updated) rather than jumping to the linker code.
More concretely, the first time the program calls puts
the “special bit” of
code that’s executed starts with the two instructions we saw in the PLT. The
first instruction (0x0000000000400406 <+6>: push 0x0
) tells the linker
that function 0
, i.e., puts
, was called and the second instruction jumps to
the code to start the linking process (0x000000000040040b <+11>: jmp
0x4003f0
).
As an illustration, if we set a breakpoint right after the first call to
puts
, we see that the default values in the GOT have been replaced with the
proper addresses—because the linker has updated the GOT to hold the correct
address for puts
. Next we check the where the GOT begins by running x
_GLOBAL_OFFSET_TABLE_
, which returns the result 0x601000: 0x00600e28
.
Interesting, the start of the GOT is now closer to address 0x601018
where the
address of puts
is stored. If we run x/8wx _GLOBAL_OFFSET_TABLE_
we get
back
0x601000: 0x00600e28 0x00000000 0xf7ffe168 0x00007fff
0x601010: 0xf7deee10 0x00007fff 0xf7a7c690 0x00007fff
This means that the address for puts
must be 0xf7a7c690
. We can confirm
this by running x 0x601018
, which should give us the same result.
TL;DR
- The program doesn’t know at compile-time where dynamically linked libraries
will be placed in memory, so it doesn’t know exactly where functions like
puts
will be placed. - As a result, the program will instead calls a stub function for
puts
at a location it does know, i.e., in the procedure linkage table. - The stub function (
puts@plt
) will get the runtime address of the realputs
from the global offset table (a data structure updated by the linker at runtime). - When we first call the library function from our C code, the linker will update the GOT to use the correct address of the library function.
Example
Consider the following C code:
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>
void hello()
{
printf("Oh, it is on, like prawn who yawns at dawn\n");
_exit(1);
}
void vuln()
{
char buffer[512];
fgets(buffer, sizeof(buffer), stdin);
printf(buffer);
exit(1);
}
int main(int argc, char **argv)
{
vuln();
}
Let’s step through the process of finding the GOT entry for exit()
and
modifying that entry to instead point to hello()
.
After compiling this code, we open the binary in gdb. First we need the address
we will be using for the overwrite, specifically the address of the hello
function. We can do this in gdb using x hello
. We get 0x804853b <hello>:
0x83e58955
as a result, so we know 0x804853b
is the address of hello
. Side
note: position independent executables and ASLR will make it harder to find
this address.
Using disas main
(or the source code above), we see that main
calls vuln
so we set a breakpoint at that call instruction. We run
and hit the
breakpoint. We can step into the function using the si
command. Once inside
vuln
, we can use disas
to look at the assembly. We see some C library
functions being called.
Lets follow the call to exit
into the PLT and the GOT; specifically, this is the instruction we care about: 0x080485d8 <+61>: call 0x8048400 <exit@plt>
.
A quick command (disas 0x080485d8
) will show us the code in the PLT:
Dump of assembler code for function exit@plt:
0x08048400 <+0>: jmp *0x804a01c
0x08048406 <+6>: push $0x20
0x0804840b <+11>: jmp 0x80483b0
End of assembler dump.
The first jump tells us the GOT entry we are looking for: 0x804a01c
. We now
know the address in the GOT where the address for the exit
C function is
located. Let’s overwrite that using gdb. So we do set
{int}0x804a01c=0x804853b
and we do c
for continue. Surprise surprise, the
program outputs:
Oh, it is on, like a prawn who yawns at dawn
We have successfully redirected code execution by overwriting an address on
the global offset table and causing all calls to exit
to instead call hello`.
Sources
Check out the following links for more information on how the global offset table works and how you might use it when exploiting binaries.
1 : https://systemoverlord.com/2017/03/19/got-and-plt-for-pwning.html
- https://www.youtube.com/watch?v=kUk5pw4w0h4
- https://stackoverflow.com/questions/20486524/what-is-the-purpose-of-the-procedure-linkage-table
- https://www.youtube.com/watch?v=t1LH9D5cuK4
- https://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html