At GitHub Universe 2019, we introduced the GitHub Archive Program along with the GitHub Arctic Code Vault. We set out to preserve open source software for future generations by storing your code in an archive built to last a thousand years, and now is the time. The GitHub Arctic Code Vault is now in production.

What’s in the vault?

On 02/02/2020, we took a snapshot of all active public repositories on GitHub to archive in the vault. The snapshot includes repositories with:

  • Any commits between the announcement at Universe on November 13, 2019, and February 2, 2020
  • At least one star and any commits from the year before the snapshot between February 3, 2019, and February 2, 2020
  • At least 250 stars, regardless of when their most recent activity occurred

Learn more about the snapshot criteria

A guide for the future

With archives around the world and an arctic vault full of code, we wanted to provide context and direction with a guide, that’s included in every archive. The human-readable index and guide itemizes the location of each repository and explains how to recover the data. The guide provides an overview of what software is, an explanation of open source and its ethos, and a technical overview of how to unpack the archive’s contents. 

On January 23, we open sourced draft 0.1 of the guide, but we need your help to improve it. Take a look and submit a pull request in the GitHub Archive Program repository by midnight on February 29, 2020.

Help us improve the guide

The Archive Program advisory board

We gathered an advisory board of experts in anthropology, archaeology, archiving, history, linguistics, science, and long-term projects to help us maximize the archive’s value for future generations.

We held our first Advisory Summit on January 16-17. After examining the archive program, the advisory board identified three significant themes:

  • Visualization: Include visual representations within the archive content itself, such as showing scientific plotting code next to its results, and making the physical artifacts of the archive visually striking and aesthetically appealing.
  • Metadata: Include repository metadata (description, language, commit logs, associated wiki, etc.) and relevant larger-scale metadata, such as the last several State of the Octoverse and a snapshot of Wikipedia.
  • Redundancy: Particularly, creating smaller “fractional” deposits of the archive, with contents such as the 10,000 most-starred and most-depended-upon repositories, along with a small random sample of other repositories. We’re also donating copies of those deposits to prominent archives and libraries worldwide, such as Oxford’s Bodleian Library.

What’s next?

Today we begin production of the Arctic Code Vault which takes about two months to complete. In the spring we’ll return to Svalbard to make the official deposit of the Arctic Code Vault in the Arctic World Archive.

Join us at our booth at Satellite in May 2020, where we’ll share more about the Archive Program and the importance of preserving the software we collectively create today for future generations. 

Thank you to the open source community for all your contributions.

Learn more about the GitHub Archive Program