Kevin Backhouse
I'm a security researcher on the GitHub Security Lab team. I try to help make open source software more secure by searching for vulnerabilities and working with maintainers to get them fixed.
When you’re fixing a bug, especially a security vulnerability, you should add a regression test, fix the bug, and find & fix variants.
This blog is the second in a four-part series about how I am helping to harden the security of the Exiv2 project. Today, I want to talk about the three things that I believe you should always do when you are fixing a bug, especially when the bug is a security vulnerability:
As an example, this pull request, which fixes GHSA-pvjp-m4f6-q984, has three commits corresponding to those three steps.
Step 2 is obvious, but why are steps 1 and 3 important, and why should you do them in that order?
The purpose of regression testing is to make sure that you never make the same mistake twice. When you fix a bug, you should add a regression test, which is designed to fail if the same bug is ever reintroduced. I don’t think regression testing is controversial, but it often feels like a waste of time to write a test for a bug that you already fixed. Realistically, how likely is it that somebody will ever reintroduce the exact same bug? It’s a tempting step to skip if you’re in a hurry. These are the reasons why I think it’s important:
Not everybody agrees with this approach. In previous software development jobs, I have worked with people who believe that you should only fix what’s verifiably broken and nothing else, especially when you’re fixing the bug in a released version of the software rather than on the development branch. They’re scared that by changing code that you don’t need to, you might accidentally introduce a new bug. But I think that’s the wrong risk to worry about. The reality is that if you find one bug, there are almost always others nearby. We have so much evidence of that from the miserable history of computer security, that it should no longer be in dispute. However, it’s clear that many software vendors persist in following the minimal fix strategy; in the worst case only fixing the symptoms of the bug rather than the root cause to the great frustration of security researchers. With this in mind, I am a firm believer that every bug is an opportunity to toss in a few extra defensive fixes.
For the rest of this blog post, I want to show you a couple of examples from Exiv2 of defensive fixes that I have added. In both examples, I’ll also show you how I used a simple CodeQL query to find variants in other parts of the codebase.
This vulnerability is a crash caused by an integer divide by zero in minoltamn_int.cpp:
long focalLength = getKeyLong ("Exif.Photo.FocalLength" ,metadata); long focalL35mm = getKeyLong ("Exif.Photo.FocalLengthIn35mmFilm",metadata); long focalRatio = (focalL35mm*100)/focalLength; <===== Divide by zero
On line 2174, focalLength
is read from the image metadata, which means that it’s an attacker-controlled value. If the value is zero then it causes a crash on line 2176.
As you can see in the pull request, I fixed an almost identical bug on line 2183. But I also expanded the search to look for integer divisions that could potentially cause a crash due to INT_MIN/-1
, which is another less well known gotcha of integer division.
This is the CodeQL query that I used:
/** * @kind: problem */ import cpp import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis from DivExpr div, Expr rhs where rhs = div.getRightOperand() and div.getType() instanceof IntegralType and not lowerBound(rhs) > 0 select rhs, "Possible integer divide by zero. Type: " + rhs.getType().toString()
The query is very simple. It looks for an integer division operator whose right hand operand isn’t obviously greater than zero. (Exiv2 is unlikely to ever divide by a negative number on purpose, so any division operator with a possibly negative right-hand operand is suspect. Dividing by a negative number might enable an attacker to trigger the INT_MIN/-1
crash that I mentioned earlier.) I used the SimpleRangeAnalysis library to implement the greater-than-zero condition. Range analysis computes upper and lower bounds for expressions in the program. For example, it might deduce that the index variable of a simple for-loop is in the range 0..9. However, as its name suggests, it is only able to deduce accurate ranges in relatively simple cases1, so it often defaults to the range INT_MIN..INT_MAX (with values depending on the type of the expression). As a result, the query produces quite a few false positive results. You can see the results of the query on a recent version of Exiv2 here. There are 24 results, all of which I believe are false positives. (I already fixed everything that looked like a genuine bug.) Here’s an example of one of the false positives in value.hpp:
ok_ = (value_.at(n).second > 0 && value_.at(n).first < LARGE_INT); if (!ok_) return 0; return value_.at(n).first / value_.at(n).second;
It’s clear that this division is safe, but the logic is a bit too complicated for SimpleRangeAnalysis to analyze.
There’s a trade-off between spending more time improving the accuracy of the CodeQL versus just manually auditing the results. In this particular case, I didn’t think it was worth spending any more time on the query, since integer divisions are relatively rare in the Exiv2 codebase, so they’re easy to audit manually. Some queries are worth spending more time on though—as I will discuss in the next blog post in this series.
std::vector
provides two indexing methods: at()
and operator[]
. The former throws an exception if the index is out-of-bounds, but the latter does not. Therefore, it is safer to use at()
unless you are very sure that the index is in bounds.
The crash (#1706) was caused by an out-of-bounds access in value.hpp:
template long ValueType::toLong(long n) const { ok_ = true; return static_cast(value_[n]); }
toLong()
is a utility function that is used in many places. Any one of those call-sites might accidentally call toLong()
with an out-of-bounds index, so it would be a substantial auditing task to check them all. Instead, I decided that it would be much safer to use at()
here.
As you can see in the pull request, I replaced quite a few instances of operator[]
with at()
, mostly in the same header file. This is the CodeQL query that I used to search for variants:
/** * @kind: problem */ import cpp import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis from FunctionCall call, ClassTemplateInstantiation t, TemplateClass c where call.getQualifier().(VariableAccess).getTarget().getName() = "value_" and call.getTarget().getName() = "operator[]" and t = call.getQualifier().getType().getUnspecifiedType() and c = t.getTemplate() and c.getSimpleName() != "map" select call, "Unsafe use of " + c.getSimpleName() + "::operator[]."
This query is very simplistic because it restricts the results to vectors named value_
. If that clause is commented out, then the query has 356 results, which is too many.
Unfortunately, my simplistic query was not good enough to catch all the variants, because another crash caused by operator[]
was discovered a few weeks later (GHSA-v5g7-46xf-h728). So I decided to spend some time improving the accuracy of this query, as I will discuss in more detail in my next blog post.
On several other occasions, I used simplistic CodeQL queries to quickly find variants. Here’s a short list in case you are interested in looking at other CodeQL examples:
Security bugs are rarely unique. Usually, there are similar bugs elsewhere in the code. That’s why I believe that every security bug is an opportunity to toss in a few extra defensive fixes. If you’re already going through the hassle of doing a new release to fix a security problem, then why not take the opportunity to do some extra security hardening?
In this blog post, I showed you a couple of examples of security bugs that I fixed in Exiv2. In both cases, I did three things:
CodeQL is a very useful tool for finding variants, as demonstrated through both examples in this blog post. However, the CodeQL queries that I showed in this blog post were a bit simplistic, which led to a higher percentage of false positives (which is fine when you’re searching for other bugs to fix, but not fine if you’re trying to turn it into a formal rule). In the next blog post, I will show how I took one of the unpolished queries from this blog post and improved it so that it could be added to Exiv2’s continuous integration test suite—showing the true power of CodeQL.
Follow GitHub Security Lab on Twitter for the latest in security research.
1 I was the original author of SimpleRangeAnalysis a few years ago, so I am responsible for its limitations. 😳
GitHub’s annual month-long game jam, where creativity knows no limits! Throughout November, dive into your favorite game engines, libraries, and programming languages to bring your wildest game ideas to life. Whether you’re a seasoned dev or just getting started, it’s all about having fun and making something awesome!
Git 2.47 is here, with features like incremental multi-pack indexes and more. Check out our coverage of some of the highlights here.
Let’s take a closer look at some of the stars of the Open Source Zone at GitHub Universe 2024 🔎