How to stay safe from repo-jacking

Repo-jacking is a specific type of supply chain attack. This blog post explains what it is, what the risk is, and what you can do to stay safe.

| 8 minutes

“Repo-jacking” is a type of supply chain attack that has received attention for its potential impact on open source software. In this blog post, I’ll explain what repo-jacking is and what you can do to stay safe. The TL;DR summary is that if you’re getting all of your software dependencies from a package manager like npm or PyPI then you can’t be directly affected by repo-jacking. You need to be more careful if you’re pulling dependencies directly from GitHub, but there’s a simple solution which is to lock to a specific commit ID, and I’ll explain how to do that in a few of the most common scenarios.

Supply chain attacks are, in general, a very serious concern because a successful attack could potentially deliver malware to a very large number of victims. But the chances of an attacker achieving a successful large-scale supply chain attack with repo-jacking alone are very small. The majority of software dependencies are delivered via package managers, so the most likely attack vector would be to use repo-jacking to upload a malicious package to a package manager, but package managers like npm or PyPI won’t let you do that unless you also have access to the maintainer’s credentials. And if you have access to the maintainer’s credentials then you already have the power to launch a supply chain attack, without any need for repo-jacking.

What is repo-jacking?

Repo-jacking is a specific type of software supply chain attack. A supply chain attack is when a trusted software package gets replaced with malware. For example, imagine the consequences if the “automatic update” system of a major software vendor got infiltrated: it would enable the infiltrator to distribute malware to millions of devices, pwning them all instantly. The same scenario applies if an attacker manages to replace a major open source software project with malware. For example, imagine if the kubernetes/kubernetes repository got replaced with malware. A lot of developers regularly download and build Kubernetes, so if malware was inserted into its build system then it would get run on a lot of developer laptops.

The most straightforward way that a supply chain attack could happen on a GitHub repository would be for somebody with write permission (that is, one of the project’s maintainers) to push a malicious commit. Alternatively, somebody might create a pull request and manage to sneak a malicious code change past the reviewers. GitHub has numerous protections, such as mandatory 2FA, protected branches, and access management controls, to guard against those kinds of attacks.

Repo-jacking is a supply chain attack that could happen if a GitHub user changes their username. As a purely hypothetical example, consider GitHub’s own account at https://github.com/github, which has hundreds of public repositories. Now, suppose GitHub were to change its account name to gh1. Then, a repository such as https://github.com/github/cmark-gfm would be renamed to https://github.com/gh/cmark-gfm. Now, imagine that an attacker manages to register a new GitHub account with the newly available username github. Then, they could create a repository named cmark-gfm and start serving malware to developers who are still downloading their software from the original address.

What’s the risk of repo-jacking?

GitHub uses a tombstoning algorithm to reduce the risk of repo-jacking by permanently retiring specific owner name, repository name combinations2. The github/cmark-gfm example above is purely hypothetical, because, in that scenario, the old name would get automatically tombstoned. For example, even if an attacker managed to register the username github, they would still be prevented from creating a new repository with the name cmark-gfm because that owner name, repository name combination (github/cmark-gfm) would be permanently retired. Therefore, repo-jacking is only a risk for repositories that fall below a certain usage threshold. We don’t tombstone all renamed repositories because there’s a tradeoff between usability and security: a tombstone is a potential inconvenience for our users which we don’t want to impose unless there’s a genuine security-related reason to do so. That’s why our tombstoning policy only kicks in after the repository has met certain criteria, such as exceeding a specific number of clones.

Therefore, the only way to achieve a successful repo-jacking attack is by targeting a project that falls under our usage thresholds, which means that the attack is unlikely to have a high impact.

Automatic redirection

A contributing factor that could increase the risk of repo-jacking is that renamed repositories are automatically redirected. Using the github/cmark-gfm example again, github/cmark-gfm would be automatically redirected to gh/cmark-gfm. In other words, operations like cloning github/cmark-gfm would continue to work. However, the automatic redirect would be removed at the moment when the new github user creates a repository named cmark-gfm. The benefit of automatic redirection is that it reduces the number of outages that a rename might cause, but it also means that people are much less likely to notice that one of their dependencies has been renamed. That, in turn, increases the number of projects that could potentially be affected by a successful repo-jacking attack.

Package managers

Software is often distributed via package managers (rather than being downloaded directly from GitHub), which creates an extra layer of defense against repo-jacking. Examples of package managers are npm (JavaScript), PyPI (Python), and crates.io (Rust). To replace a package on a package manager with malware, you would typically need to gain (unauthorized) access to the maintainer’s account, for example, by guessing their password. That’s a separate issue from repo-jacking. Also, an attacker who has access to the maintainer’s account can probably upload malware directly, without any need for repo-jacking.

There is one situation in which a package manager could accidentally amplify a repo-jacking attack, and that’s if it automatically pulls the latest version from GitHub. That happened in May 2022 when a security researcher repo-jacked the hautelook/phpass repository. Packagist, a package manager for PHP, automatically crawls GitHub for new releases and uploads them automatically, so the researcher was able to upload malware purely through repo-jacking. Packagist was recently updated to eliminate the risk of that happening again.

Other package managers such as PyPI and npm work differently: you cannot upload a new version of a package without first successfully authenticating with the package manager. In other words, repo-jacking is insufficient by itself to upload malware to PyPI or npm.

Downloading software directly from GitHub

Repo-jacking is a more realistic concern if you or your project’s build system are downloading software directly from GitHub, rather than from a package manager. The three most likely reasons why you might be doing that are:

  1. You’re using GitHub Actions.
  2. You’re using the Go programming language.
  3. You’re using git submodules.

All three enable you to directly reference another GitHub repository. For example, in a GitHub Actions workflow file, you can use an open source action like this:

steps:
    - uses: actions/javascript-action@v1.0.1

That will download and run code from the actions/javascript-action repository. Similarly, it is quite common in the Go programming language to import code directly from other GitHub repositories like this:

require (                                                                                                                                                                                                             
        github.com/google/uuid v1.3.0
)

The simplest way to avoid repo-jacking when you’re downloading software directly from GitHub is to reference a specific commit ID. Since Git commit IDs are SHA-1 hashes, an attacker would have to break SHA-1 (theoretically possible, but very unlikely) to replace the code with malware. If you’re using Git submodules, then you don’t need to worry because submodules are always locked to a specific commit ID: you only need to be careful when you update the submodule to a new version. If you’re using Go, then you can use a go.sum file to lock cryptographic hashes of all your dependencies. And finally, If you’re using GitHub Actions, you can use this syntax to lock a specific commit ID:

steps:
    - uses: actions/javascript-action@4be183afbd08ddadedcf09f17e8e112326894107

Using the repo ID to check for repo-jacking

The advice in this section is primarily aimed at developers who’re implementing a system (such as a package manager) that interacts with GitHub’s API, but the code is very simple so you could easily use it to check your own dependencies too. Using GitHub’s API, it is very easy to check if a repository has been renamed or replaced. Besides its name, every repository has a unique repository ID, which you can retrieve using the “Get a repository” API. The ID is a unique integer, which does not change if the repo is renamed. Repo-jacking involves replacing the original repository with a new one, which causes the repo ID to change. Therefore, it’s easy to detect repo-jacking by checking that the repo ID hasn’t changed. Here’s an example of how to do that using the PyGithub Python library:

from github import Github

def check_repo_id(full_name, id):
    gh = Github()
    repo = gh.get_repo(full_name)
    if repo.id != id:
        raise Exception(f'repo ID has changed to {repo.id}')
    if repo.full_name != full_name:
        raise Exception(f'repo has been renamed to {repo.full_name}')

check_repo_id("github/cmark-gfm", 75244322)

The function check_repo_id uses GitHub’s API to get the repository ID and throws an exception if it doesn’t match the expected value. As an additional precaution, it also checks that the repository hasn’t been renamed.

Conclusion

Repo-jacking is a supply chain attack that could happen if a GitHub user decides to change their username; however, GitHub uses a tombstoning algorithm to reduce the risk of repo-jacking by permanently retiring the names of popular repositories when they get renamed. Software is often downloaded from package managers, rather than from GitHub directly, which means that the package manager creates an extra layer of protection against repo-jacking. If your project has dependencies that are directly downloaded from GitHub, then the simplest way to stay safe is by locking to a specific commit ID.

A very exciting new advance in supply chain security, which I haven’t even touched on in this blog post, is the work that’s being done on build provenance. GitHub workflows can now use OpenID Connect (OIDC) to securely upload build artifacts to a package manager. The workflow sends an OIDC token to the package manager, containing information which the package manager can use to decide whether to publish a new version of the package. For example, the token includes the name of the repository and the workflow file that was run. The repository ID is also included, which means that it prevents repo-jacking (as well as many other forms of supply chain attack). OIDC build provenance has already been integrated into npm, PyPI, homebrew, and RubyGems.org.

Another important development in supply chain security is the cross-organization collaboration on the Supply-chain Levels for Software Artifacts (SLSA) framework. The website https://slsa.dev/ is worth a read.

Learn more about GitHub’s security features.

Notes


  1. The username https://github.com/gh is already taken, so this isn’t actually possible. 
  2. Learn more about how we permanently retire in our documentation

Written by

Kevin Backhouse

Kevin Backhouse

@kevinbackhouse

I'm a security researcher on the GitHub Security Lab team. I try to help make open source software more secure by searching for vulnerabilities and working with maintainers to get them fixed.

Related posts