Skilling for the future: How GitHub is advancing diversity, equity, and inclusion within open source communities
In the coming months, we’re scaling, expanding, and launching new programming to further DEI within open source communities.
GitHub secret scanning has been securing our users’ code by scanning for and revoking secrets since 2015. Recently, we’ve focused on scanning for package registry credentials as well—a significant and…
GitHub secret scanning has been securing our users’ code by scanning for and revoking secrets since 2015. Recently, we’ve focused on scanning for package registry credentials as well—a significant and important expansion on our original service.
Package registry credentials grant access to services that host software that hundreds of thousands of other software products rely on. If one of these secrets is leaked, rather than compromising one product, it can compromise thousands. Because the impact of leaking these secrets is so large, it’s extremely important that they’re protected.
GitHub has recently collaborated with PyPI and RubyGems to scan for their credentials and help secure the millions of applications that depend on the Python and Ruby open source ecosystems. We also scan for npm, NuGet, and Clojars secrets. In each case, we automatically scan every commit to a public repository or gist for potentially leaked credentials. If we find one, we notify the registry, and they automatically revoke any compromised secrets and notify their owner.
This blog post starts with an introduction to secrets, GitHub Secret Scanning, and the open source supply chain. If these concepts are familiar to you, feel free to skip to the sections on why revoking package registry credentials is so important and which package registry credentials we support.
When writing software, we usually include third-party software to do some of the work for us in order to avoid “reinventing the wheel.” Sometimes third-party software can be imported into our code, but often we have to communicate with an external service in order to use it. To do this, we need some way to authenticate ourselves.
Secrets, also known as tokens, are unique strings that you can use to authenticate yourself, similar to using a username and password to log in to an account. For example, at GitHub, if you want to make changes to your account or repositories using an API, you need to generate a personal access token. With this secret, you can programmatically access your account and make the same changes you’d be able to make if you logged in from a web browser. Because of this, it’s important to keep secrets… well… secret.
So, what happens if you accidentally check a secret into your repo for the whole world to see? That’s where GitHub secret scanning comes in.
GitHub secret scanning is a service that scans pushes to repositories, searching for exposed secrets in order to keep people’s code and third-party accounts secure. We do this automatically for public repositories, and you can enable it on private repositories that are enrolled in GitHub Advanced Security. We partner with over 40 cloud providers to scan for more than 70 different secret types, and that list is growing constantly. When we find a secret, we do one of two things: if it was found in a public repository, we send it straight to the third-party cloud provider for revocation; or, if it was in a private repository with secret scanning enabled, we surface the secret to you directly.
If you’re interested in learning more about how secret scanning works, check out our documentation.
As mentioned above, software frequently relies on third-party software (often referred to as “packages”). Sometimes you have to pay for these packages, but there’s an entire community of people who create and maintain software that can be used for free. This free software is what we call “open source.”
Open source software is used by hundreds of millions of products, including other open source packages. One package could use a dozen other packages, and each of those in turn could use even more. If a package stops working or introduces a vulnerability, the software that relies on it can also be affected. This creates a chain of dependencies—a supply chain. The “open source supply chain” is the chain of software dependencies for packages in a program.
If a package in the open source supply chain becomes compromised, all software that uses it, directly or indirectly, becomes compromised as well. This ripple effect is why protecting these packages is paramount. As a result, GitHub offers the dependency graph and Dependabot alerts on all repositories: these tools are crucial for keeping track of your dependencies and knowing when one of them is vulnerable and needs to be updated. GitHub secret scanning also helps protect the open source supply chain, but instead of alerting you to vulnerable dependencies, it helps keep dependencies from becoming vulnerable in the first place by scanning for package registry credentials.
Packages are often hosted by package management services, which developers can use to install, update, and manage the packages they’re using. While these package management services are invaluable, they also create a vector for attack: open source developers need to authenticate themselves when updating their packages in these services, and to do that they often use secrets. So, what happens if a developer accidentally leaks their secret? The fallout could be catastrophic.
To help convey the magnitude of leaking a supply chain secret, let’s look at boto3, the AWS SDK for Python, which is hosted on a package management service called PyPI. boto3 had over 153 million downloads in the month of April alone. That’s hundreds of millions of people using boto3 in their software, and they all depend on boto3 to be reliable and secure. If a developer for boto3 accidentally leaked their PyPI access credentials, a malicious actor could exploit this vulnerability and upload malware to the service, potentially impacting hundreds of millions of software applications and every single one of their users.
Think this is hyperbole? Let’s look at a real example that happened to ESLint. In 2018, an attacker gained access to the npm account of an ESLint maintainer. They then published malicious versions of two ESLint packages, which when downloaded, sent users’ own npm secrets to the attacker. Fortunately, the incident was resolved within a few hours, but not before compromising users’ npm credentials. As a result, npm had to revoke every active secret generated before the incident was resolved to make sure their users were protected.
Scanning for supply chain secrets is important because unlike with other secrets, where exposing the secret impacts only one account, an exposed supply chain secret can potentially impact millions of downstream software applications and their users. Fortunately, GitHub secret scanning scans for npm secrets, which means that if an ESLint developer leaks their npm credentials in a GitHub repository, their packages—and every piece of software that depends on them—will be safe.
GitHub secret scanning has recently added support for RubyGems and PyPI secrets. We also scan for npm, NuGet, and Clojars secrets. All of these package management services host a combined 2.3 million packages (as of May 2021).
So, what happens if you leak one of these secrets in a public repository? Let’s use RubyGems as an example. If you commit a RubyGems API key to a public repository, GitHub secret scanning will automatically search the commit for it. Within a few seconds, it will find the secret and notify RubyGems of the leak. RubyGems then revokes the secret and sends you an email notifying you about the compromised secret.
With each new package registry that integrates with GitHub secret scanning, the open source supply chain becomes more secure and better protected against the relatively common issue of leaked secrets. Across npm, NuGet, Clojars and now PyPI and RubyGems, GitHub secret scanning has found and helped revoke thousands of package registry credentials leaked in public repositories over the past year. We’re not stopping here, either; we’re actively engaging with other package registries and hope to have more integrations to announce soon!
GitHub is committed to protecting developers and offering a best-in-class service, and we’re always looking to add support for new secret types (for any cloud provider, not just package management services). Recent non-package-management-service additions include Adobe and OpenAI. If you’re a cloud provider and want to help keep your customers’ secrets safe, we would love to partner with you; take a look at our documentation to find out how!
To learn more about the current capabilities of GitHub secret scanning, check out the documentation.