Securing the open source supply chain: The essential role of CVEs

Vulnerability data has grown in volume and complexity over the past decade, but open source and programs like the Github Security Lab have helped supply chain security keep pace.

| 12 minutes

As security continues to shift left, developers are increasing as the first line of defense against vulnerabilities. In fact, open source developers now spend nearly 3x more time on security compared with a few years ago—which is an incredible development, considering how much the world relies on open source software.

I’m Madison Oliver and I manage the team that curates the vulnerability data here within the GitHub Security Lab, where developers and security professionals come together to secure the open source software (OSS) we all rely on.

At GitHub, we’re dedicated to supporting open source—including securing it.

Our researchers are constantly keeping the OSS ecosystem aware of vulnerabilities by discovering and disclosing new vulnerabilities, educating the community with research, performing variant analysis for OSS projects, and curating the GitHub Advisory Database, our own vulnerability reporting database dedicated to open source. We also regularly assign and publish vulnerabilities to the broader vulnerability management ecosystem on behalf of OSS maintainers. Much of this is done through Common Vulnerabilities and Exposures (CVEs).

Whether you’re contributing to open source software, maintaining a project, or just relying on open source like most developers, CVEs help keep that software secure. CVEs and the broader vulnerability landscape have grown and changed drastically in recent years, but we’ve kept pace by empowering the open source community to improve their software security through policies, products, open source solutions, and security automation tools.

Let’s jump in.

What is a CVE?

A CVE is a unique identifier for a vulnerability published in the CVE List, a catalog of vulnerability records. MITRE, a non-profit spun out of the MIT Lincoln Laboratory, maintains the catalog and the CVE Program with the goal of identifying, defining, and cataloging publicly-disclosed cybersecurity vulnerabilities. We actively contribute back to the CVE Program through serving on working groups and the CVE Program board, providing feedback and helping ensure that this critical security program is aligned with open source developers’ needs.

At GitHub, we have a unique perspective on CVEs: we not only use CVE data for our internal vulnerability management and for our Advisory Database, which feeds into Dependabot, code scanning, npm audit, and more, but also create it. Since 2019, we have managed two CVE Numbering Authorities (CNAs)—special entities allowed to assign and publish CVE IDs—where we produce CVE data, much of it provided by OSS maintainers. In 2019, we published 29 CVEs (CVE-2019-16760 was the first) and last year, we published more than 1,700.

We’re not the only ones seeing an increase in vulnerability data. In fact, it’s one of the bigger industry-wide changes of the past decade, and it has implications for how we keep software secure.

The double-edged sword of increased vulnerability data

When the CVE Program originated in 1999, it published 321 CVE records. Last year, it published more than 28,900, increasing 460% in the past decade—and the amount is expected to continue growing.

The graph shows the total number of CVEs published per year from 1999 to 2023. It begins with fewer than 2,500 CVEs annually, rising gradually until 2016, when it reaches around 5,000. From 2017, the number increases sharply, nearing 30,000 by 2023.

This growth means downstream consumers of vulnerability data—like yourself and our Advisory Database curation team—have more and more vulnerability data to sift through each day, which can lead to information overload. It also means increased vulnerability transparency (that is, making vulnerability information publicly available), which is fundamental to improving security across the industry. After all, you can’t address a vulnerability if you don’t even know about it. So, while this increase in data may seem overwhelming, it also means we are becoming much more aware.

But we’re not just dealing with more data; we’re facing a larger variety of vulnerabilities that have a greater impact through network effect. Thankfully, better data sources and increased automation can help manage the deluge. But first, let’s better understand the problem.

New vulnerability types and their widening impact through the software supply chain

When a novel vulnerability type is disclosed, it very often spurs researchers to seek, and find, more examples of this new class, leading to a flood of new information. For example, an abundance of speculative execution vulnerabilities followed the Spectre and Meltdown disclosures, and, to the chagrin of many open source developers, regular expression denial-of-service (ReDoS) attacks have increased since 2021 (though they’ve been around much longer than that).

We’ve also seen an increase in cloud-related vulnerability disclosures—not because the cloud is a new concept, but because disclosing vulnerabilities in cloud-related products has long been a point of contention. The prevailing reasoning against disclosing cloud vulnerabilities was to avoid generating alerts that don’t require user action to remediate. But as cloud-based technologies have become integral to modern development, the need for transparency has outweighed the perceived negatives. The CVE rules were updated this year to encourage disclosures, and major vendors like Microsoft have publicly stated that they’re making this change to support transparency, learning, safety, and resilience of critical cloud software.

Beyond cloud vulnerabilities, the 2021 edition of the OWASP Top 10 also saw an increase in vulnerabilities due to broken access controls, insecure design, security misconfigurations, vulnerable or outdated components (like dependencies), and security logging and monitoring failures.

Whatever the vulnerability type, new categories of vulnerabilities often require new remediation and prevention tactics that developers must stay on top to keep their project secure. When a spur of research leads to a dramatic increase in vulnerability disclosures, development teams may need to spend significant effort to validate, deduplicate, and remediate these unfamiliar vulnerabilities. This creates a huge, time-sensitive burden for those responding, during an urgent time when delays should be avoided.

To complicate matters even further, these new vulnerability types don’t even need to be in your code to affect you—they can be several layers down. The open source libraries, frameworks, and other tools that your project depends on to function—its dependencies—form the foundation of its supply chain. Just like vulnerabilities in your own code, vulnerabilities in your supply chain can pose a security risk, compromising the security of your project and its users.

For example, our 2020 Octoverse report found that the median JavaScript project on GitHub used just 10 open source dependencies directly. That same repository, however, can have 683 transitive dependencies. In other words, even if you only directly include 10 dependencies, those dependencies come with their own transitive dependencies that you inherit. Lacking awareness of transitive dependencies can leave you and your users unknowingly at risk, and the sheer number of transitive dependencies for the average project requires automation to scale.

More security data coming straight from open source

Since starting as a CNA in 2019, the open source GitHub CNA has grown so much that we are now the 5th largest CVE publisher of all time. This shows that open source maintainers—who are the source of this data—want to play a role in the security landscape. This is a big shift, and it’s a boon for open source, and everyone who relies on it.

The graph shows the percentage of all CVEs published by the GitHub CNA from 2019 to 2023. Starting just above 0% in 2019, the percentage rises steeply to around 5% in 2021. After a slight dip, it increases again to approximately 6.5% by 2023.

Indulge me for a brief history lesson to illustrate the degree of this change.

For a significant portion of the CVE Program’s history, MITRE was the primary entity authorized to create, assign, and publish CVE IDs. While CNAs have existed since the program’s inception, their only duty until 2016 was to assign IDs. As the demands on the program grew, they led to scalability issues and data quality concerns, so the program expanded CNA duties to include curating and publishing CVE record details. Since then, the program has made significant efforts to increase the number of CNAs engaged in the program, and now has over 400 partners from 40 countries.

Nowadays, more than 80% of CVE data originates directly from CNAs like ours. The explosion in CNAs has helped scale the program to better support the growing number of requests for CVEs, and almost more importantly, also means more primary sources of vulnerability data. Since CNAs must have a specific scope of coverage, as opposed to CNAs of Last Resort (CNA-LR) like MITRE, whose scope is everything else that isn’t already covered, a CNA’s scope tends to include software that it owns or is heavily invested in securing.

The graph compares the percentage of CVEs assigned by CNAs and CNA-LRs from 1999 to 2024. Initially, CNA-LRs assign 100% of CVEs but rapidly drop to 50% in 2017. From 2017 onward, the percentage of CVEs assigned by CNAs increases steadily, surpassing 80% by 2024.

The overwhelming benefit of this structure is that subject matter experts can control the messaging and vulnerability information shared with unique communities, leading to higher quality data and lower false positives. For us, this means a larger focus on securing open source software—and for any developer who contributes to or uses open source, that means more secure outcomes. Maintainers can ensure that they’re the primary source of vulnerability information on GitHub by leveraging repository security advisories to notify their end users and enabling private vulnerability reporting to help ensure that new vulnerabilities reach them first.

Looking to level up your skills in software security?

Check out the GitHub Secure Code Game developed by our Security Lab. It provides an in-repository learning experience where users can secure intentionally vulnerable code!

Tackling data overload with automation

Let’s return to that double-edged sword we started with: there’s more vulnerability data than ever before, which means more visibility and transparency, but also more data to sort through. The answer isn’t to reduce the amount of data. It’s to use automation to support easier curation, consumption, and prioritization of vulnerability data.

Keeping track of the number of direct dependencies at scale can be burdensome, but the sheer number of transitive dependencies can be overwhelming. To keep up, software bill of materials (SBOM) formats like SPDX and CycloneDX allow users to create a machine-readable inventory of a project’s dependencies and information like versions, package identifiers, licenses, and copyright. SBOMs help reduce supply chain risks by:

  • Providing transparency about the dependencies used by your repository.
  • Allowing vulnerabilities to be identified early in the process.
  • Providing insights into the security, license compliance, or quality issues that may exist in your code base.
  • Enabling you to better comply with various data protection standards through automation.

When it comes to reporting vulnerabilities, CVE Services has been extremely helpful in reducing the friction for CNAs to reserve CVE IDs and publish CVE Records by providing a self-service web interface—and that CVE data is a critical data source for us. It accounts for over 92% of the data feeding into our Advisory Database, so anything that helps ensure this information is published faster and more efficiently benefits those using the data downstream, like our team, and by proxy, developers on GitHub!

The chart shows the percentage contributions of various vulnerability data feeds to the GitHub Advisory Database. CVE via NVD dominates with 92.41%, followed by other smaller contributors like Repository GHSA at 3.80% and PYPA at 1.85%, with several other sources contributing less than 1% each.

At GitHub, we leverage APIs from our vulnerability data providers to ingest data for review, export our data in the machine-readable Open Source Vulnerability (OSV) format for consumption by others, and notify our users automatically through Dependabot alerts. While the increased automation around vulnerability data has allowed for easier reporting and consumption of this data, it’s also led to an increased need to automate the downstream impact for developers—finding and fixing vulnerabilities in both your own code and within your dependencies.

Automated software composition analysis (SCA) tools like Dependabot help identify and mitigate security vulnerabilities in your dependencies by automatically updating your packages to the latest version or filing pull requests for security updates. Keep in mind that the coverage of SCA tools can often vary in scope—the GitHub Advisory Database and osv-scanner emphasize vulnerabilities in open source software, while grype focuses on container scanning and file systems—and ensure that their SCA solution supports the types of software in their production environment. Prioritization features in SCA tools, like Dependabot’s preset and custom auto-triage rules, can help users more efficiently manage data overload by helping determine which alerts should be addressed.

Static application security testing (SAST) and SCA tools will both help you detect vulnerabilities. While SCA is geared towards addressing open source dependencies, SAST focuses more on vulnerabilities in your proprietary code. GitHub’s SAST tools like code scanning and CodeQL help find vulnerabilities in your code while Copilot Autofix simplifies vulnerability remediation by providing natural language explanations for vulnerabilities and suggesting code changes.

As the vulnerability landscape continues to evolve and aspects of vulnerability management shift left, it’s critical that open source developers are empowered to engage in security. The double-edged sword of increased vulnerability data means more awareness, but requires automation to manage properly, especially when considering the wider impact that supply chain vulnerabilities can cause. At GitHub and within the Security Lab, we are committed to continuing to support and secure open source by providing tools, guidance, and educational resources for the community as software and vulnerabilities progress.

Learn more

Written by

Madison Oliver

Madison Oliver

@taladrane

Madison Oliver is a vulnerability transparency advocate and senior security manager at GitHub, leading the advisory database curation team. She is passionate about vulnerability reporting, response, and disclosure, and co-chairs the relevant Open Source Security Foundation (OpenSSF) working group and serves on the CVE Program Board. Her views are enriched by her prior experience as a product incident response analyst at GitHub and as a vulnerability coordinator at the CERT Coordination Center at the Software Engineering Institute at Carnegie Mellon University.

Related posts