The GitHub Security Lab’s journey to disclosing 500 CVEs in open source projects
The GitHub Security Lab audits open source projects for security vulnerabilities and helps maintainers fix them. Recently, we passed the milestone of 500 CVEs disclosed. Let’s take a trip down memory lane with a review of some noteworthy CVEs!
When I stepped onto the scale this morning, I remembered that there are some numbers that feel awkward to celebrate, while perhaps some others are worth celebrating! Recently, the GitHub Security Lab passed the milestone of 500 CVEs disclosed to open source projects. What’s a CVE? In short, it’s the record of a security vulnerability, under the CVE program, intended to inform impacted users. So, finding more vulnerabilities in open source shouldn’t be good news, right? Even as developer communities are getting better at keeping themselves secure, security issues may still slip through their defenses. This means that there will always be a need for security researchers, like the Security Lab, to discover and help fix them.
If you’re not familiar with the Security Lab, we’re a team of security experts who work with the broader open source community to help fix security issues in their projects, with the goal of improving the overall security posture of open source. Our core activity is to audit open source projects, not only the ones hosted on GitHub–and help their maintainers fix the vulnerabilities we find, for free. This research is foundational for our other activities, such as education, improvement of our open source static analysis rules, and tooling. And now we are celebrating more than 500 CVEs disclosed. 🎉
How did we get here?
The history of the Security Lab dates back to Semmle, the company that created CodeQL, and which was later acquired by GitHub. 2017 was a pivotal year, as we realized how powerful our product could be for finding security vulnerabilities. Unlike many other static analysis tools, CodeQL efficiently codifies insecure patterns and responds urgently to new security threats at scale. To showcase this capability, Semmle created a small security research team who used CodeQL to search for vulnerabilities in open source projects, and a web portal named LGTM.com where all open source projects could run CodeQL for free and be alerted of potential security flaws directly within their pull requests. This approach grew into an important company objective: find and fix vulnerabilities at scale in open source. This was a way of giving back to the open source community, just like any software company should.
In September 2019, GitHub acquired Semmle, providing an ideal home for advancing the goal of improving open source security at scale. This led to the creation of the Security Lab, with a larger team and new initiatives, including curating the GitHub Advisory Database. The GitHub Advisory Database provides developers with the most accurate information about known security issues in their open source dependencies. GitHub also incorporated CodeQL as a foundation of code scanning and a core pillar of GitHub Advanced Security (GHAS), keeping it free for open source. Code scanning reached parity with LGTM.com in 2022.
We have also expanded beyond CodeQL and now use a variety of tools in our audit activities, such as fuzzing. But CodeQL remains one of the most effective tools in our toolbox, because it enables us to conduct variant analysis at scale, and allows us to share our knowledge of insecure patterns with the community, in the form of executable CodeQL queries.
The secret? Our maintainers-first approach
Not all reports get a CVE. CVE records are useful for informing downstream consumers, so when there is no downstream consumer, there is no need for a CVE. For example, a vulnerability in a CI workflow, or a vulnerability discovered in a development branch and fixed before it reached any release does not require a CVE. While we are credited for 500 CVEs, we have actually reported and helped fix over 1,000 vulnerabilities. But who’s counting, right?
That said, what matters most to us is our fix rate. When looking at the tens of thousands of reports in the GitHub Advisory Database, on average, 80% are fixed by maintainers. However, the fix rate for vulnerabilities the Security Lab reported is much higher: 96% of our reports end up with a fix. This reflects the validity of our reports and our effective collaboration with maintainers. We want project maintainers to succeed, and because of that, we are flexible on the disclosure timeline–when it’s safe for the rest of the community–we provide fix suggestions, and we always help test the new release. Our report template is open source for all security researchers who would like to use it as an inspiration for their own reports.
Now, let’s take a look at some vulnerabilities that stand out!!
Highlights from our first 500
CVE-2017-9805: Remote Code Execution vulnerability in Apache Struts
The bug that started it all. Man Yue Mo found an unsafe deserialization vulnerability in Apache Struts, which enabled an unauthenticated remote attacker to execute arbitrary code. Apache Struts was already in the news at the time, because an older vulnerability—CVE-2017-5638—had been leveraged in the Equifax breach. Mo, who at the time was still working on Semmle’s data science team, found the bug by tweaking the CodeQL query for unsafe deserialization.
CVE-2018-4407: Kernel crash caused by out-of-bounds write in Apple’s ICMP packet-handling code
By exploiting an integer overflow in the XNU kernel’s networking code, a malicious TCP packet could trigger an out-of-bounds memory access, which would instantly crash the macOS kernel (video) and reboot any Mac or iOS device on the same network as the attacker, without user interaction. It even had a tweetable poc.
GHSL-2020-204: Remote Code Execution in Corona Warn App Server
A Remote Code Execution (RCE) vulnerability was found in the German application used to track COVID contacts. An unauthenticated attacker would have been able to able to fully compromise the server where citizens were sending their anonymous infection information to facilitate the tracking of the exposure of other German citizens.This is a good example of a vulnerability that did not require a CVE since the CWA app was only used and deployed by the German and Belgian governments.
CVE-2021-3560: Privilege escalation with polkit
polkit is a system service installed by default on many Linux distributions, including popular distributions such as RHEL and Ubuntu. A race condition vulnerability in this library enabled an unprivileged local user to get a root shell on Linux systems. The bug was in error handling code, and could be triggered by disconnecting the client too early.
CVE-2021-45046: Bypass of initial mitigations for Log4Shell
December 2021 may be remembered by Java developers and security folks for a RCE vulnerability found in the popular Log4J logging library. The Java world faltered with what was probably the worst vulnerability ever affecting the Java ecosystem. The Apache maintainers quickly published a patch for it; however, our researchers found that the fix was not sufficient and reported a bypass affecting certain OSes to the maintainers.
Multiple script injections and “pwn request” vulnerabilities in implementations of GitHub Actions workflows
We noticed emerging insecure patterns in the implementation of GitHub Actions and helped fix more than a hundred instances in open source projects. We also published guidelines and CodeQL queries to find these types of vulnerabilities, and an open source tool that helps users set the right permissions for the tokens used in these pipelines to limit the damage in case of an exploit. Since the vulnerabilities were in the implementation of CI/CD pipelines the reports didn’t get CVEs assigned as no immediate action was needed by the open source projects’ users once they were fixed.
CVE-2022-20186: Privilege escalation in Arm Mali GPU
This one is a vulnerability in the Arm Mali GPU kernel driver that can be used to gain arbitrary kernel memory access from an untrusted app on a Pixel 6, to eventually gain root privileges and disable SELinux.
The road to the next 500 CVEs
With the continuous improvements of CodeQL, and the ongoing modeling of new frameworks, turbo charged by the use of Large Language Models (LLMs), we are disclosing vulnerabilities faster and at a larger scale than ever before. It won’t be long until we write again to celebrate the next 500 CVEs.
Our dream, however, is to reach a point where the impact of the education and protection efforts–from us and the community at large–will balance this audit and disclosure activity, and result in finding less vulnerabilities in open source code. For example, because CodeQL is available for all projects via code scanning, any improvement will help us find more issues, but on the other hand an increased use of code scanning will prevent these issues from happening in the first place.
But we cannot do that alone. We need all of you.
Assemble! Securing open source is a team effort!
With CodeQL and multi-repository variant analysis, you can multiply your audit’s impact by coding an insecure pattern and finding all occurrences in your code portfolio–we know that bugs are often copy/pasted throughout projects. You can also multiply your impact by contributing your CodeQL queries back to the open source repository, and sharing them with the community, to find and fix even more occurrences, and protect many projects–as well as the open source software supply chain.
If you maintain an open source project you can enable code scanning and Dependabot for free to immediately benefit from this security knowledge as a first line of defense. I encourage you to also enable private vulnerability reporting so that teams like the Security Lab, who audit open source projects, can report issues to you privately to collaborate on a fix.
References
- Man Yue Mo (@m-y-mo)
- Kevin Backhouse (@kevinbackhouse)
- Alvaro Muñoz (@pwntester)
- Jaroslav Lobačevski (@jarlob)
- Bean Stalking: Growing Java beans into RCE
- Alvaro Muñoz at BlackHat 2016: A Journey from JNDI/LDAP Manipulation to Remote Code Execution Dreamland
Tags:
Written by
Related posts
CodeQL zero to hero part 4: Gradio framework case study
Learn how I discovered 11 new vulnerabilities by writing CodeQL models for Gradio framework and how you can do it, too.
Attacking browser extensions
Learn about browser extension security and secure your extensions with the help of CodeQL.
Cybersecurity spotlight on bug bounty researcher @adrianoapj
As we wrap up Cybersecurity Awareness Month, the GitHub Bug Bounty team is excited to feature another spotlight on a talented security researcher who participates in the GitHub Security Bug Bounty Program—@adrianoapj!