SHA-1 collision detection on GitHub.com
A few weeks ago, researchers announced SHAttered, the first collision of the SHA-1 hash function. Starting today, all SHA-1 computations on GitHub.com will detect and reject any Git content that…
A few weeks ago, researchers announced SHAttered, the first collision of the SHA-1 hash function. Starting today, all SHA-1 computations on GitHub.com will detect and reject any Git content that shows evidence of being part of a collision attack. This ensures that GitHub cannot be used as a platform for performing collision attacks against our users.
This fix will also be included in the next patch releases for the supported versions of GitHub Enterprise.
Why does SHA-1 matter to Git?
Git stores all data in “objects.” Each object is named after the SHA-1 hash of its contents, and objects refer to each other by their SHA-1 hashes. If two distinct objects have the same hash, this is known as a collision. Git can only store one half of the colliding pair, and when following a link from one object to the colliding hash name, it can’t know which object the name was meant to point to.
Two objects colliding accidentally is exceedingly unlikely. If you had five million programmers each generating one commit per second, your chances of generating a single accidental collision before the Sun turns into a red giant and engulfs the Earth is about 50%.
Why do collisions matter for Git’s security?
If a Git fetch or push tries to send a colliding object to a repository that already contains the other half of the collision, the receiver can compare the bytes of each object, notice the problem, and reject the new object. Git has implemented this detection since its inception.
However, SHA-1 names can be assigned trust through various mechanisms. For instance, Git allows you to cryptographically sign a commit or tag. Doing so signs only the commit or tag object itself, which in turn points to other objects containing the actual file data by using their SHA-1 names. A collision in those objects could produce a signature which appears valid, but which points to different data than the signer intended. In such an attack the signer only sees one half of the collision, and the victim sees the other half.
What would a collision attack against Git look like?
The recent attack cannot generate a collision against an existing object. It can only generate a colliding pair from scratch, where the two halves of the pair are similar but contain a small section of carefully-selected random data that differs.
An attack therefore would look something like this:
-
Generate a colliding pair, where one half looks innocent and the other does something malicious. This is best done with binary files where humans are unlikely to notice the difference between the two halves (the recent attack used PDFs for this purpose).
-
Convince a project to accept your innocent half, and wait for them to sign a tag or commit that contains it.
-
Distribute a copy of the repository with the malicious half (either by breaking into a hosting server and replacing the innocent object on disk, or hosting it elsewhere and asking people to verify its integrity based on the signatures). Anybody verifying the signature will think the contents match what the project owners signed.
How is GitHub protecting against collision attacks?
Generating a collision via brute-force is computationally too expensive, and will remain so for the foreseeable future. The recent attack uses special techniques to exploit weaknesses in the SHA-1 algorithm that find a collision in much less time. These techniques leave a pattern in the bytes which can be detected when computing the SHA-1 of either half of a colliding pair.
GitHub.com now performs this detection for each SHA-1 it computes, and aborts the operation if there is evidence that the object is half of a colliding pair. That prevents attackers from using GitHub to convince a project to accept the “innocent” half of their collision, as well as preventing them from hosting the malicious half.
The actual detection code is open-source and was written by Marc Stevens (whose work is the basis of the SHAttered attack) and Dan Shumow. We are grateful for their work on that project.
Are there Git collisions?
Not yet. Git’s object names take into account not only the raw bytes of the files, but also some Git-specific header information. The PDFs provided by the SHAttered researchers collide in their raw bytes, but not when added to a Git repository. The same technique could be used to generate a Git object collision, but like the generation of the original SHAttered PDFs, it would require spending hundreds of thousands of dollars in computation.
What future work is there?
Blocking collisions that pass through GitHub is only the first step. We’ve already been working with the Git project to include the collision detection library upstream. Future versions of Git will be able to detect and reject colliding halves no matter how they reach the developer: fetching from other hosting sites, applying patches, or generating objects from local data.
The Git project is also developing a plan to transition away from SHA-1 to another, more secure hash algorithm, while minimizing the disruption to existing repository data. As that work matures, we plan to support it on GitHub.
Written by
Related posts
Inside the research: How GitHub Copilot impacts the nature of work for open source maintainers
An interview with economic researchers analyzing the causal effect of GitHub Copilot on how open source maintainers work.
OpenAI’s latest o1 model now available in GitHub Copilot and GitHub Models
The December 17 release of OpenAI’s o1 model is now available in GitHub Copilot and GitHub Models, bringing advanced coding capabilities to your workflows.
Announcing 150M developers and a new free tier for GitHub Copilot in VS Code
Come and join 150M developers on GitHub that can now code with Copilot for free in VS Code.