GitHub Archive Program: the journey of the world’s open source code to the Arctic

Image of Julia Metcalf

Your code is safe and sound in the Arctic 

At GitHub Universe 2019, we introduced the GitHub Archive Program along with the GitHub Arctic Code Vault. Our mission is to preserve open source software for future generations by storing your code in an archive built to last a thousand years.

On February 2, 2020, we took a snapshot of all active public repositories on GitHub to archive in the vault. Over the last several months, our archive partners Piql, wrote 21TB of repository data to 186 reels of piqlFilm (digital photosensitive archival film). Our original plan was for our team to fly to Norway and personally escort the world’s open source code to the Arctic, but as the world continues to endure a global pandemic, we had to adjust our plans. We stayed in close contact with our partners, waiting for the time when it was safe for them to travel to Svalbard. We’re happy to report that the code was successfully deposited in the Arctic Code Vault on July 8, 2020. 

Join us as we follow the code in its journey to the Arctic, and take a look at a few other things we’ve been up to here at the GitHub Archive Program. 

PiqlFilm is digital photosensitive archival film

The journey of the world’s open source code to the Arctic Circle  

Your code’s journey begins in Piql’s facility in Drammen, Norway where the boxes with 186 film reels were shipped to Oslo Airport and then loaded into the belly of the plane which provides passenger service to Svalbard. Svalbard, roughly 600 miles (1000 km) north of the European mainland, just recently opened up to visitors from countries within the Schengen Area and the European Economic Area.

Boxes with 186 film reels, shipped to Oslo Airport and on to Svalbard.null

The code landed in Longyearbyen, a town of a few thousand people on Svalbard, where our boxes were met by a local logistics company and taken into intermediate secure storage overnight. The next morning, it traveled to the decommissioned coal mine set in the mountain, and then to a chamber deep inside hundreds of meters of permafrost, where the code now resides fulfilling their mission of preserving the world’s open source code for over 1,000 years.

nullCode stops over for intermediate secure storage overnight.

Introducing the Arctic Code Vault Badge   

Millions of developers around the world contributed to the open source software now stored in the Arctic Code Vault. To recognize and celebrate these contributions, we designed the Arctic Code Vault Badge, which is shown in the highlights section of a developer’s profile on GitHub. Hover and you can discover some of the repositories an individual contributed to.

Profiles now display the Arctic Code Vault Contributor badge in the "Highlights" section.Hover over the badge to see a user's contribution to the Code Vault.

An update from our Archive Program partners 

Internet Archive

The Internet Archive is a well-known, widely beloved non-profit digital library which provides free public access to collections of digitized materials. In partnership with the GitHub Archive Program, the Internet Archive (IA) commenced its ongoing archive of GitHub public repositories on April 13 of this year. At present, IA is using a two-pronged approach. First, their well-known Wayback Machine is accessing and archiving raw GitHub data as WARCs, or Web ARChive files. As of this writing they have archived some 55TB of data. Second, they have the goal of making entire archived GitHub repositories available via “git clone,” while also keeping repo comments, issues, and other metadata easily accessible on the web. This second initiative is well underway and initial archiving is expected to commence this month.

Software Heritage Foundation

Software Heritage is a non profit, multi-stakeholder initiative launched by Inria in collaboration with UNESCO with the goal to collect, preserve and share the source code of our software commons. They already archive more than 130 million projects, with their full development history, and we are delighted to announce that 100 million of these are from GitHub. Thanks to the collaboration announced at GitHub Universe 2019, the archival engine is being improved with the goal to keep it up to speed with GitHub‘s growth, but if the project you are interested in, or its latest version, is not archived yet, you do not need to wait, it’s easy to trigger its archival right now in a few clicks on https://save.softwareheritage.org. 

Project Silica 

Project Silica is developing the first storage technology designed and built from the media up for cloud-scale storage of long-lived data. By leveraging recent discoveries in ultrafast laser optics, data is stored in quartz glass, through a process that permanently changes the physical structure of the glass material. Quartz glass is a durable storage media that offers unparalleled data lifetimes of upwards of tens of thousands of years. It is resilient to electromagnetic interference, water, and heat, making it the ideal storage medium for ensuring the world’s open source software is forever preserved for future generations. As a partner in the GitHub Archive Program, Project Silica is committed to driving storage innovation, and developing a storage technology that addresses the need for a sustainable and reliable storage technology for the world’s long-lived data. We’ve archived 6,000 of the world’s most popular repositories as a proof of concept for future archives.  

What’s next? 

Code, culture, history, and technology: The Tech Tree

Every reel of the archive includes a copy of the “Guide to the GitHub Code Vault” in five languages, written with input from GitHub’s community and available at the Archive Program’s own GitHub repository. In addition, the archive will include a separate human-readable reel which documents the technical history and cultural context of the archive’s contents. We call this the Tech Tree. 

Inspired by the Long Now’s Manual for Civilization, the Tech Tree will consist primarily of existing works, selected to provide a detailed understanding of modern computing, open source and its applications, modern software development, popular programming languages, etc. It will also include works which explain the many layers of technical foundations that make software possible: microprocessors, networking, electronics, semiconductors, and even pre-industrial technologies. This will allow the archive’s inheritors to better understand today’s world and its technologies, and may even help them recreate computers to use the archived software.

Encapsulating the world’s cultural context and technical history is a challenging prospect, and we expect the Tech Tree to evolve and iterate over time. We will soon publish to the Archive Program’s GitHub repository a very initial draft list of works selected for the Tech Tree, along with, importantly, a request for community input. We look forward to incorporating ideas and suggestions from the GitHub community before the Tech Tree is added to the Arctic Code Vault.