Today’s Outage
A few hours ago I was upgrading our continuous integration setup when a configuration error caused it to run against our production environment rather than our testing environment. Before every…
A few hours ago I was upgrading our continuous integration setup when a configuration error caused it to run against our production environment rather than our testing environment.
Before every run of our test suite we destroy then re-create the database so that we have a known, clean starting point. This also allows us to continuously integrate topic branches with potentially different database schemas. Due to the configuration error GitHub’s production database was destroyed then re-created. Not good.
We immediately began restoring the database from our most recent backup. Unfortunately, while most tables in the GitHub database are small, our “events” table is large. This significantly slowed the restoration process.
Eventually the decision was made to skip the events table in order to speed up the restoration process. As a result, your dashboard and profile might currently be blank – rather annoying, but hopefully only temporary. We will be restoring the events table bit by bit over the next few days in an attempt to minimize downtime.
Worse, however, is that we may have lost some data from between the last good database backup and the time of the deletion. Newly created users and repositories are being restored, but pull request state changes and similar might be gone.
Obviously, this should have never had happened. It should be very difficult to cause a database failure like this and very easy to recover from it.
Our plan moving forward:
- Completely isolate the test environment from the production environment, i.e. make production hosts unreachable from the testing VM.
- Reduce the size and growth rate of our events table. This is already well underway but is now one of our top priorities.
- Begin storing binlogs to reduce data loss in the event of a future db restoration. The completion of #2 will help make this much easier.
We’re very sorry about this, especially if we ruined your work day or Sunday afternoon. Please email support@github.com if you are still having problems or need to discuss the outage further.
Written by
Related posts
Inside the research: How GitHub Copilot impacts the nature of work for open source maintainers
An interview with economic researchers analyzing the causal effect of GitHub Copilot on how open source maintainers work.
OpenAI’s latest o1 model now available in GitHub Copilot and GitHub Models
The December 17 release of OpenAI’s o1 model is now available in GitHub Copilot and GitHub Models, bringing advanced coding capabilities to your workflows.
Announcing 150M developers and a new free tier for GitHub Copilot in VS Code
Come and join 150M developers on GitHub that can now code with Copilot for free in VS Code.