CTF - Lecture Notes: Just-in-time Return-oriented Programming

In coarse-grained ASLR, only the base address of each region is randomized, e.g., the code section starts at a different offset each time the program is executed. If an attacker gains knowledge (via an information leak) of a single address into a code region (say Libc) that is often sufficient to find the location of all other interesting code locations in that region because the relative offsets remain the same. For instance, if the attacker learns the runtime address of printf they can figure out the address of system by adding an known offset. Caveat: this offset is only “known” if the attacker has knowledge of and access to the version of libc running on the system—often not hard to satisfy this condition.

Fine-grained ASLR. To combat these information leaks, many researchers have proposed schemes to make ASLR more fine-grained. For example, a hypothetical fine-grained scheme might:

randomize code at the function or basic block level,
swap registers used in instructions
swap instructions for equivalents

These transformations might be done statically at compile time, at program start up, or dynamically rearrange during execution. Of course, the overhead of these transformations are a concern, so most approaches fall into one of the first two categories (compile time or load time).

Is Software Secure Now? Fine-grained ASLR removes the case where a single leaked address is enough to construct a ROP-chain or execute a ret2libc attack. However, Fine-grained ASLR schemes (up until 2013) didn’t account for the fact that, in many systems, a single information leak can be exploited multiple times. This oversight is the intuition that drives the research we will talk about today.

Just-In-Time Code Reuse

If an attacker can repeatedly leverage an information leak, they can dynamically map out a significant portion of the code memory even in when fine-grained ASLR is implemented. If they can map out the code memory, they can dynamically examine that code to find new gadgets and construct a new ROP chain on the fly. This attack is referred to as just-in-time code reused or jit-rop.

The basic outline of the jit-rop is as follows:

the attacker finds an information leak
that information leak is repeated leverages to map memory
the mapped memory is analyzed to find useful code (api calls, gadgets)
the attacker compiles a ROP-chain using the useful code
the chain is inserted into memory and control-flow is redirected

Because all of these tasks are done just-in-time, jit-rop attacks target software that supports scripting, e.g., web browsers and PDF document viewer. However, many of these steps are similar to what an attacker would do a basic offline ROP attack; the only difference being that all of these steps are done at runtime, e.g., running as javascript in the browser the attacker is trying to exploit.

The Leak. The first thing an attacker needs to execute a JIT-ROP attack is an information leak that allows the attacker to read any arbitrary address in memory and can be leveraged repeatedly. While this requirement might seem unlikely, consider the following example. The attacker (more accurately, the attacker’s script):

creates an object with an exploitable buffer overflow,
creates a string object,
those two objects are likely to be next to each other on the heap.
overflow the first object to modify the string object, setting the string object’s size to be very very very large (think 2**32)
now the attacker can effectively read arbitrary memory locations by reading character at offset X in the string.

Mapping Memory. The challenge with mapping memory is that if the attacker reads an unmapped address, the program will crash, and the attacker will lose all progress because the next execution of the program with re-randomize memory. Given this fact and that the vast majority of the virtual address space is unmapped, it doesn’t make sense for an attacker to just guess at an address an hope they get lucky.

However, if the attacker can somehow get a single pointer into code memory, they instantly reveal an entire 4KB-aligned page of code memory that is guaranteed to be mapped. Further, it is quite likely that the revealed code page includes pointers to other code pages in memory, e.g., due to control-flow instructions.

Going back to our previous web browser example, the attacker can throw a third object on the stack (e.g., CButtonLayout contains some function pointers). If that third object contains a code pointer, then the attacker only needs to read the object’s value to get the needed first code pointer.

Linear-sweep Disassembly. Even with a pointer in a code page, that page is just a stream of bytes that the attacker needs to make sense of. Fortunately, the attacker can use existing disassembly techniques to try to parse the bytes into instructions. The most basic class of disassembly techniques is called linear-sweep disassembly. The idea here is simple: start at the first byte and try to interpret it as an instruction—remember instructions are variable-length in x86. Move on to the next byte in the sequence and repeat the process.

Caveat: complier will often mix data in with instructions and if you try to interpret data as instructions you’ll get bad results and those bad results may propagate.

There is a lot of work in disassembly literature to address these problems, but this paper relies on a simple heuristic approach: if the instruction is invalid throw it out (along with some of its neighbors because we probably got them wrong too). So plenty of potential for optimization here, but the basic idea is sufficient for a proof of concept.

How well does code page harvesting work? In the simple example of Internet Explorer with an initial code pointer from CButtonLayout, they were able to discover 301 code pages from the IE process.

Of course, the number of pages the attacker can map depends on where they start. To test page harvesting from a variety of entry points, the authors ran a number of applications, took a snapshot of the application’s entire code memory and then tested the number of reachable pages from any given starting page. Results:

For Internet Explorer, over half of the potential starting pages allowed them to recover 300+ pages.
For Firefox and Adobe Acrobat, a quarter of starting pages allowed the authors to recover 1000+ pages.

Long story short, that is a lot of code pages to mine for gadgets.

Finding Useful Code. Now that the attacker has recovered code from the application, the next step to find useful gadgets and API calls in that code.
Gadget finding is done using existing gadget finding approaches.

API Calls. The API calls here are important because they are the interface between the process and the operating system and, as such, allow the attacker to accomplish something interesting with the attack. In other words, executing a bunch of arbitrary gadgets is cute, but it is not useful unless that attacker can use those gadgets to do something useful like pop a shell.

In the gadget chains we’ve discussed so far in this course, our API-targets are either system calls (via int 0x80 or syscall instructions) or libc functions (e.g., system()). The authors made a number of interesting observations here (remember there is a windows focus to their work):

attackers want to avoid syscalls because syscall numbers vary greatly between OS versions; thus, it is hard to make a one-size-fits-all exploit that gets a wide range of vulnerable users.
common wisdom is to use a VirtualProtect API call to change the page permissions to enable execution for more traditional shellcode, but it is rare to find VirtualProtect callsites.
however, LoadLibrary and GetProcAddress callsites are common and those pair of functions can accomplish the same tasks as traditional shellcode.

For reference, LoadLibrary will load a software module into the address space (e.g., a DLL file) and GetProcAddress will give a pointer to a target variable/function in the library. In the paper they used the following to launch the calculator program:

// Pseudocode
LoadLibrary("kernel32")
GetProcAddress(k32, "WinExec") 
WinExec('calc')

Building the ROP chain. At this point, the attacker has mapped out a significant number of code pages, disassembled them, and found a list of addresses for potentially useful gadgets and API functions. The next step is to take those building blocks and just-in-time compile them into a working ROP chain that accomplishes the attacker’s goal. Since the authors use existing work for this step, so we won’t go into detail.

Hijacking Control-flow. The final step is to redirect control flow to execute the ROP chain. This requires another memory error that allows an attacker to corrupt a code pointer and it’s a process we should already be familiar with.

Putting it all together. It is important to reiterate that with a JIT-ROP attack, typically all of these steps are done at runtime, inside of some script running in the target application, e.g., javascript inside of the web browser. For their proof of concept, the authors wrote most of the code in C++ and used LLVM to transform it into javascript. They then used that javascript to exploit a real vulnerability in Internet Explorer (CVE-2012-1876).

Discussion Questions

What’s the primary contribution of this paper? I’d say two things: first the observation that fine-grained ALSR makes the unfounded assumption that a memory leak will only be used once; second, that a single pointer into a code page gives you an entire 4KB page to find pointers to additional pages. Much of the rest of the work is engineering, i.e., putting together components that already exist.

How do we make ASLR better? Well, one idea is to modify the code pages to remove direct references to other code pages, i.e., prevent the mapping step. Turns out (as discussed in the Isomeron paper) that there are other ways to leak code pointers (e.g., through objects/function pointers on the heap/stack. Alternatively, we can try to stymie the JIT disassembly through anti-disassembly techniques. I don’t know enough about the topic to really say if this will work well or not. Finally, we can make it harder for the bad guy to accomplish anything useful by removing API calls—i.e. an attacker can’t use LoadLibrary if it isn’t in the program. This last approach is essentially what Chrome does with its sandboxing.

For more inspiration

Lecture based on the paper “Just-In-Time Code Reuse: On the Effectiveness of Fine-Grained Address Space Layout Randomization” by Snow et al.