Fuzzing sockets: Apache HTTP, Part 2: Custom Interceptors
In this second installment, I will focus on how to build our own custom ASAN interceptors in order to catch memory bugs when custom memory pools are implemented and also on how to intercept file system syscalls to detect logic errors in the target application.
In the first part of this series, I explained how to start fuzzing Apache HTTP, how to implement custom mutators in AFL++, and how to define your own HTTP grammar.
In this second installment, I will focus on how to build our own custom ASAN interceptors in order to catch memory bugs when custom memory pools are implemented and also on how to intercept file system syscalls to detect logic errors in the target application.
Let’s get on with it!
Manual poisoning
Let’s first quickly review how Address Sanitizer (ASAN) shadow memory and poisoning works.
ASAN maintains a shadow memory that tracks each byte in the real memory and can determine whether any given byte in memory is address-accessible or not. Bytes in invalid memory regions are called red zones or poisoned memory.
So, when you compile your program with Address Sanitizer, it instruments every memory access and prefixes it with a check. Then, ASAN will track the program and if the program tries to write to a non-valid memory region, ASAN will stop the execution and generate a diagnostics report. Otherwise, it will allow the program to continue. This allows you to detect all sorts of invalid memory accesses and memory mismanagement issues.
In some cases, having greater control over the poisoned memory can be useful for developers and security researchers. Think, for example, in the case of a custom function where you handle memory in a way that ASAN is not able to catch. That is why ASAN also provides a manual memory poisoning external API that allows the user to poison and unpoison a region of memory manually.
We can make use of these features through the external ASAN interface by including the ASAN library:
#include <sanitizer/asan_interface.h>.
Then we can use the ASAN_POISON_MEMORY_REGION and ASAN_UNPOISON_MEMORY_REGION macros when we call malloc
and free
respectively. The typical workflow is to poison the entire memory region first, and then unpoison allocated chunks of memory leaving poisoned red zones between them.
This approach is relatively simple and easy to implement but suffers from “reimplementing the wheel” each time we target a new program. It would be great if we could simply intercept a certain group of functions, in the same way as ASAN does.
For that reason, I’m going to show another approach: custom interceptors
Custom interceptors
Motivation
At the beginning of this blog post, I talked about the need to implement custom interceptors for dealing with custom memory pools implementations, as is the case with Apache HTTP. So the question that we have to ask ourselves is “why do we need to implement custom interceptors?”
Let’s look at an example to get a better understanding.
Consider for example the following code snippet, where a call to apr_palloc
was made in order to allocate memory:
In this case, the value of the second argument is 126 (in_size = 126
) or, in other words, our aim is to allocate 126 bytes inside the g->apr_pool
memory pool. These 126 bytes will be rounded up to 128 bytes due to memory alignment requirements. This much is clear.
If you review our previous ProFTPd blog post, you can find information on how ProFTPd memory pools are implemented internally. This memory pool implementation is based on the Apache HTTP one, so in this case, the implementation is almost the same. Apache HTTP memory pools consist of a linked-list of memory nodes as follows:
Then, the program will add new nodes to this linked-list as additional space is needed. When the free space of a node is not enough to meet apr_palloc
demand, then allocator_alloc function is called. This function will be responsible for creating a new node and adding it to the linked list. However, as we can see in the next picture, such an allocation size is always rounded up to MIN_ALLOC
bytes. Therefore, each of these nodes will have a minimum size of MIN_ALLOC
.
Later, a call to malloc
is made inside this function with the aim of allocating new memory for this node. In the following picture you can observe how a malloc
call with size=8192 is performed:
We find ourselves facing a scenario in which we made a call to apr_palloc
with size = 126…
…but ASAN has poisoned a memory area of size 8192:
The end result is a total of 8192-126 = 8066 bytes marked as writable by ASAN when it is in fact not real allocated memory but rather free space in the node. So, a subsequent memcpy(np, source, 5000)
call would lead to an out-of-bound write, overwriting the rest of the node’s memory. However, we wouldn’t see any ASAN alert message, which would cause us to miss memory corruption bugs even with ASAN enabled.
Errors like these can result in vulnerabilities such as the one I published a year ago in ProFTPD: CVE-2020-9273.
Preparatory steps
Next, I’m going to explain how to build LLVM sanitizers from the sources. This is needed in order to add our own custom ASAN interceptors to the ASAN library.
First you should know that LLVM sanitizer runtimes are part of what is known as the “compiler-rt” runtime libraries. In my case, I downloaded version 9.0.0 of compiler-rt sources because it was the version that I previously had installed in my Linux distro. You can download these sources from the following link: https://releases.llvm.org/9.0.0/compiler-rt-9.0.0.src.tar.xz
You can build it with:
cd compiler-rt-9.0.0.src
mkdir build-compiler-rt
cd build-compiler-rt
cmake ../
make
After the compiler-rt build process is finished, you must add the next environmental variables to the Apache build process:
LD_LIBRARY_PATH= /Downloads/compiler-rt-9.0.0.src/build-compiler-rt/lib/linux
CFLAGS="-I/Downloads/compiler-rt-9.0.0.src/lib -shared-libasan”
Finally, you need to set the environmental variable LD_LIBRARY_PATH
as follows:
LD_LIBRARY_PATH=/Downloads/compiler-rt-9.0.0.src/build-compiler-rt/lib/linux
ASAN interceptor internals
As we saw before, ASAN needs to intercept functions like malloc
and free
to track memory usage. This requires the runtime to be loaded first, prior to the library that exports these functions. For that reason, when we add the linker flag -fsanitize=address
, the compiler sets libasan first in the symbol search lookup order.
If we inspect the code, we can see that the entry function is called __asan_init
. This function in turn calls the AsanActivate
and AsanInternal
functions. It is in this second function that most of the initialization steps take place, and where InitializeAsanInterceptors
is called. This last function is the most important for our purpose:
As shown above, there is an ASAN_INTERCEPT_FUNC call for each of the functions that ASAN intercepts by default. ASAN_INTERCEPT_FUNC
is a macro which is translated into INTERCEPT_FUNCTION_LINUX_OR_FREEBSD
on Linux systems. This macro ends up calling the InterceptFunction
function which will perform all the actual function hooking logic.
#define ASAN_INTERCEPT_FUNC(name) do { \
if (!INTERCEPT_FUNCTION(name) && flags()->verbosity > 0) \
Report("AddressSanitizer: failed to intercept '" #name "'\n"); \
} while (0)
# define INTERCEPT_FUNCTION(func) INTERCEPT_FUNCTION_LINUX_OR_FREEBSD(func)
#define INTERCEPT_FUNCTION_LINUX_OR_FREEBSD(func) \
::__interception::InterceptFunction( \
#func, \
(::__interception::uptr *) & REAL(func), \
(::__interception::uptr) & (func), \
(::__interception::uptr) & WRAP(func))
Within this function, it calls the GetFuncAddr
function and this, in turn, calls dlsym()
. dlsym
allows the program to retrieve the address where that symbol (intercepted function) is loaded into memory.
It later stores the address of this function into the ptr_to_real
pointer.
So, to sum up, to define our own ASAN Interceptors, we need to perform the following steps:
- Define
INTERCEPTOR(int, foo, const char *bar, double baz) { ... }
, wherefoo
is the name of the function we want to intercept - Call
ASAN_INTERCEPT_FUNC (foo)
prior to the first call of functionfoo
(usually fromInitializeAsanInterceptors
function)
Now, I will show a real example of how to intercept a function of the APR (Apache Portable Runtime) library.
apr_palloc example
As previously noted, Apache uses custom memory pools to improve management of program-dynamic memory. That is why, if we want to allocate memory inside a memory pool, we should invoke apr_palloc
instead of malloc
.
First of all, I will show my implementation of INTERCEPTOR(void*, apr_palloc, …)
:
The ENSURE_ASAN_INITED()
macro checks if ASAN has been previously initialized before continuing the execution. The GET_STACK_TRACE_MALLOC
macro retrieves the current stack trace so that in case an exception is caught, it displays it in the ASAN report. As a general rule, we’re going to retrieve the stack trace in the interceptors as early as possible as we don’t want the stack trace to contain functions from ASAN internals.
Then, we call the original apr_palloc
function using REAL(apr_palloc)
in order to create all the internal structures for the memory pool. The apr_palloc
function itself invokes the allocator_alloc
function, which is responsible for allocating memory when it is needed. What we do is replace the malloc
call (which is being intercepted by ASAN) by __libc_malloc
. This enables us to avoid the unpoisoning of the whole memory of the node.
After the program returns from the apr_palloc
function, we do the alignment of the “in_size” integer in the same way APR does. This will make both sizes the same, and just after that we will call asan_malloc
to allocate a new memory block of size “in_size”. This new allocated memory will be handled by ASAN.
Finally, we will store both libc_malloc and asan_malloc memory addresses in an array, so we will be able to free the asan-malloced memory block when the node is destroyed.
Just as we have done with malloc
, we may also change the calls to free()
that are related to the freeing of the nodes. In our case, we are going to modify the allocator_free
and apr_allocator_destroy
functions. In addition, we will have to free the memory, which we previously allocated with asan_malloc
. To that end, I traverse the “addr” array where the addresses have been stored and I free all the memory blocks linked to this node. Finally, I include a direct call to the free()
function using the __libc_free(node)
statement.
I use this approach because it’s simple and easy to explain, but it’s pretty inefficient as it implies to traverse the whole “addr” array. A better approach would be to store node addresses into an unique(vector)
or std::set
and to point each of these addresses to a linked-list of asan_malloc-ed addresses.
File monitors
When we fuzz a file server such as a FTP server or HTTP server, we send multiple requests that will get translated into filesystem syscalls in the remote server (open()
, write()
, read()
, etc.). This could trigger logic vulnerabilities related to file access permissions such as Access bypass, business flow bypass, etc.
In the majority of the cases, however, detecting such vulnerabilities using a fuzzer like AFL can be a complicated task. This is because this type of fuzzers are more oriented towards detecting memory management vulnerabilities (stack overflows, heap overflows, etc.). It is for this reason that we need to implement new detection methods to catch these vulnerabilities.
The following file monitor method is a basic approach based on intercepting and saving file system call info for later analysis. The main idea driving the analysis is to compare the filesystem calls with their high-level counterparts, and to check if invoked syscalls are actually what they are supposed to be, and that the call order and arguments are correct.
To illustrate this point, I’ll show an example with three different WebDav requests:
- PUT
- MOVE
- DELETE
You can download the file that contains these requests in the following link.
First, I’m going to identify high-level functions involved in handling these HTTP methods. In the case of PUT, MOVE and DELETE these functions are respectively:
static int dav_method_put(request_rec *r)
static int dav_method_copymove(request_rec *r, int is_move)
static int dav_method_delete(request_rec *r)
Then, I’m going to insert a call to the log_high
function at the beginning of each function:
Likewise, we can insert an ENABLE_LOG = 0;
at the end of each function. The log_high
function code is as follows:
Now, just as we demonstrated in the previous section, we will use ASAN interception to intercept file system syscalls. In the case of Apache I’ve intercepted the following syscalls:
open
rename
unlink
This is an example of the output file:
After we obtain our output file, we need to analyze it. In my case, I used Elasticsearch to perform a-posteriori analysis. How to perform the actual analysis is beyond the scope of this post, but I’ll share in another post how I have made use of Elasticsearch for my API log analysis in the near future. I will also explain how to perform a real-time analysis using AFL++.
To be continued…
In the last post of this series, I will detail the vulnerabilities that I’ve found in Apache HTTP using the methodologies outlined in parts one and two. As this will also be the last post of my “fuzzing sockets” series, I will summarize some of my key learnings as well as introduce you to my next research subject.
Stay tuned for part three!
References
- llvm.org
- Compiler-rt source code: https://developer.apple.com/videos/play/wwdc2015/413/
- Advanced Debugging and the Address Sanitizer–Mike Swingler, Anna Zaks
- https://jonasdevlieghere.com/sanitizing-python-modules/
Tags:
Written by
Related posts
Uncovering GStreamer secrets
In this post, I’ll walk you through the vulnerabilities I uncovered in the GStreamer library and how I built a custom fuzzing generator to target MP4 files.
CodeQL zero to hero part 4: Gradio framework case study
Learn how I discovered 11 new vulnerabilities by writing CodeQL models for Gradio framework and how you can do it, too.
Attacking browser extensions
Learn about browser extension security and secure your extensions with the help of CodeQL.