Blue-teaming for Exiv2: how to squash bugs by enrolling in OSS-Fuzz

OSS-Fuzz is Google’s awesome fuzzing service for open source projects. GitHub Security Lab’s @kevinbackhouse describes enrolling a project.

Kevin Backhouse·@kevinbackhouse

November 23, 2021 | Updated December 3, 2021

| 7 minutes

This is the final blog of a four-part series about how I am helping to harden the security of the Exiv2 project. This post is about how we enrolled Exiv2 in OSS-Fuzz, which is Google’s awesome fuzzing service for open source projects.

Getting Exiv2 enrolled in OSS-Fuzz took more work than I originally imagined, but it was worth it. Going through the process led to the discovery of quite a few more bugs. It was also an interesting lesson for me as a security researcher: working on the code and changing it helped me find more bugs than when I previously treated codebases as read-only artifacts.

Wait, didn’t you already do this, Kev?

Last year, I wrote a blog post about fuzzing Exiv2 with AFL. It helped to uncover many bugs in Exiv2 and stopped the flow of new vulnerability reports for approximately a year. But as I explained in the first post of this new blog series, we started to receive new bug reports in April of this year. The problem is that I only did the AFL fuzzing as a one-time exercise, and didn’t put a continuous process in place. That was amateur hour. This time we needed to get serious, and for an open source project like Exiv2 that means enrolling in OSS-Fuzz.

Adding a libFuzzer target

Fuzzing Exiv2 with AFL is really quite straightforward: you just build Exiv2 with the afl-clang compiler and then let AFL loose on it. Enrolling a project in OSS-Fuzz is much less simple because you have to create a libFuzzer fuzzing target, which involves changing the source code and build system. It’s significantly more work, but it also gives you the opportunity to fuzz much more thoroughly. As I mentioned a few weeks ago, the biggest problem with the AFL fuzzing of Exiv2 that I did last year was that I only tested Exiv2 in its default configuration. Exiv2 has numerous command-line options that I didn’t test. The majority of the new bugs found this year only affect non-standard command-line options. Exiv2’s libFuzzer target is designed to test all the parts of the codebase that can be reached via those command-line options, so its coverage is much better than I achieved with AFL last year.

libFuzzer adds its own `main` function

This is the biggest difference between libFuzzer and AFL. When you write a libFuzzer target, you don’t write a main function. Instead, you write an entry point named LLVMFuzzerTestOneInput:

extern "C" int LLVMFuzzerTestOneInput(const uint8_t * data, size_t size) {
    // Run some tests with the data
    ...

    return 0;
}

As you can see from the declaration, the entry point is passed an array of bytes, which you can use however you like to test your code. In the case of Exiv2, we treat the byte array as though it’s an image file and attempt to read, print, and modify its image metadata. A common technique, which we haven’t used in Exiv2’s fuzz target, is to use the first few bytes of the buffer as a header with options encoded as a bitmap. For example, we could have used that technique to mimic Exiv2’s command line options. Instead, our fuzz target just attempts to run all of Exiv2’s different modes and get the maximum amount of code coverage with every test input.

To build a libFuzzer target, you add the compiler/linker flag -fsanitize=fuzzer so that the linker will add its own main wrapper around the LLVMFuzzerTestOneInput entry point. But you have to be careful not to use the -fsanitize=fuzzer flag on libraries because that will cause linker errors. Instead, you use -fsanitize=fuzzer-no-link on the libraries.

#1773 is the pull request in which we added the fuzz target to Exiv2. The fuzz target worked nicely, finding numerous bugs, but it took a bit more tinkering to make it compatible with OSS-Fuzz.

How to enroll in OSS-Fuzz

Enrolling in OSS-Fuzz is, in principle, quite straightforward. You create a pull request against the google/oss-fuzz GitHub repository, in which you add a directory for your project containing a few config files that define how to build your project’s fuzz target(s). #6186 is the pull request in which I added Exiv2.

As the timeline on #6186 shows, it took me several attempts to get the configuration right. Most of my difficulties were caused by being confused about the main function wrapper that libFuzzer adds around the LLVMFuzzerTestOneInput entry point. It turns out that you should not use the -fsanitize=fuzzer linker option when you are building for OSS-Fuzz. Instead, OSS-Fuzz passes an environment variable to your build system that you need to include on the linker command line, as we now do here. The other mistake that I made in the build script was to automatically add sanitizer flags like -fsanitize=address when building the fuzz target. It turns out that OSS-Fuzz wants to control the sanitizer flags itself, so your build script should not add them when building for OSS-Fuzz.

It took me far too long to figure this out, but the best way to test your OSS-Fuzz configuration is by creating a pull request into the main branch of your own fork, like I eventually did here. Your own fork has all the same workflows as Google’s repository, so it lets you debug the failures at your own leisure before you create a pull request on Google’s repository.

It’s a good idea to run your fuzz target privately before enrolling in OSS-Fuzz, so that you don’t get hit with a deluge of bugs. I ran Exiv2’s fuzz target for many weeks on a rented cloud server until I was reasonably confident that I had found everything. Even so, OSS-Fuzz has managed to find several new issues since we added Exiv2. One of the reasons why OSS-Fuzz is finding new issues that I missed is that it runs multiple fuzzing engines—with different sanitizer flags—whereas I only ran one libFuzzer configuration.

Corpus and dictionary

To fuzz effectively, you need to supply a corpus of inputs that give good coverage of your codebase. Exiv2 has several hundred image files in its test/data subdirectory, which we use as the initial corpus. As I mentioned in the second post in this series, one of the reasons why it’s important to add a regression test when you fix a bug is that it helps to improve the quality of your fuzzing corpus.

Using CodeQL to build a dictionary

A good corpus is the most important factor, but you can also improve the effectiveness of fuzzing by adding a dictionary. As an example, adding a dictionary helped to find GHSA-v5g7-46xf-h728. The bug was in this code:

if (buf.length() > 5 && buf.substr(0, 5) == "type=") {
    std::string::size_type pos = buf.find_first_of(' ');
    type = buf.substr(5, pos-5);
    // Strip quotes (so you can also specify the type without quotes)
    if (type[0] == '"') type = type.substr(1);  <===== out-of-bounds array access
    if (type[type.length()-1] == '"') type = type.substr(0, type.length()-1);
    b.clear();
    if (pos != std::string::npos) b = buf.substr(pos+1);
}

To hit the bug, the input file needs to contain the string “type=”. So adding “type=” to Exiv2’s fuzzing dictionary helped to find this bug.

A relatively simple CodeQL query can help you create a dictionary. This is the query that we used for Exiv2:

import cpp
import semmle.code.cpp.dataflow.DataFlow

predicate parser_string(string s, StringLiteral l) {
  s = l.getValue() and
  exists(FunctionCall call, string fcnName |
    DataFlow::localExprFlow(l, call.getAChild+()) and
    fcnName = call.getTarget().getName()
  |
    fcnName.matches("%cmp%") or
    fcnName.matches("%find%") or
    fcnName = "startsWith" or
    fcnName = "operator==" or
    fcnName = "operator!="
  )
}

from string s
where parser_string(s, _)
select s

All it does is find literal strings that are passed as an argument to a function like strcmp or startsWith.

We don’t run the query automatically because the dictionary is unlikely to need to change very often. Instead, we have just checked in a copy of the dictionary file that it created.

Conclusion

Enrolling Exiv2 in OSS-Fuzz was quite a bit of work, but definitely worthwhile. Exiv2 has now been fuzzed far more thoroughly than ever before and lots of bugs came out of the woodwork. Thanks to OSS-Fuzz, Exiv2 is now fuzzed continuously, so it will be much harder for new bugs to creep in.

This was the final post in the series about how I am helping to harden the security of the Exiv2 project. I hope it has given you some ideas for how to secure your own project!

Follow GitHub Security Lab on Twitter for the latest in security research.

Written by

I'm a security researcher on the GitHub Security Lab team. I try to help make open source software more secure by searching for vulnerabilities and working with maintainers to get them fixed.

Git

Blue-teaming for Exiv2: how to squash bugs by enrolling in OSS-Fuzz

Wait, didn’t you already do this, Kev?

Adding a libFuzzer target

libFuzzer adds its own `main` function

How to enroll in OSS-Fuzz

Corpus and dictionary

Using CodeQL to build a dictionary

Conclusion

Tags:

Written by

Kevin Backhouse

Related posts

Highlights from Git 2.50

4 trends shaping open source funding—and what they mean for maintainers

Shine a spotlight on your open source project

Wait, didn’t you already do this, Kev?

Adding a libFuzzer target

libFuzzer adds its own main function

How to enroll in OSS-Fuzz

Corpus and dictionary

Using CodeQL to build a dictionary

Conclusion

Tags:

Written by

Related posts

Highlights from Git 2.50

4 trends shaping open source funding—and what they mean for maintainers

Shine a spotlight on your open source project

We do newsletters, too

libFuzzer adds its own `main` function