GitHub Availability Report: February 2024
In February, we experienced two incidents that resulted in degraded performance across GitHub services.
In February, we experienced two incidents that resulted in degraded performance across GitHub services.
February 26 18:34 UTC (lasting 53 minutes)
February 29 09:32 UTC (lasting 142 minutes)
On February 26 and February 29, we had two incidents related to a background job service that caused processing delays to GitHub services. The incident on February 26 lasted for 63 minutes, while the incident on February 28 lasted for 142 minutes.
The incident on February 26 was related to capacity constraints with our job queuing service and a failure of our automated failover system. Users experienced delays in Webhooks, GitHub Actions, and UI updates (for example, a delay in UI updates on pull requests). We mitigated the incident by manually failing over to our secondary cluster. No data was lost in the process.
The incident on February 29 also caused processing delays to Webhooks, GitHub Actions and GitHub Issues services, with 95% of the delays occurring in a 22-minute window between 11:05 and 11:27 UTC. At 9:32 UTC, our automated failover successfully routed traffic, but an improper restoration to the primary at 10:32 UTC caused a significant increase in queued jobs until a correction was made at 11:21 UTC and healthy services began burning down the backlog until full restoration at 11:27 UTC.
To prevent recurrence of the incidents in the short term, we have completed three significant improvements in the areas of better automation, increasing the reliability of our fallback process, and expanding the capacity of our background job queuing services based on these two incidents. For the longer term, we have a more significant effort already in progress to improve the overall scalability and reliability of our job processing platform.
Please follow our status page for real-time updates on status changes and post-incident recaps. To learn more about what we’re working on, check out the GitHub Engineering Blog.
Tags:
Written by
Related posts
GitHub Availability Report: November 2024
In November, we experienced one incident that resulted in degraded performance across GitHub services.
The top 10 gifts for the developer in your life
Whether you’re hunting for the perfect gift for your significant other, the colleague you drew in the office gift exchange, or maybe (just maybe) even for yourself, we’ve got you covered with our top 10 gifts that any developer would love.
Congratulations to the winners of the 2024 Gaady Awards
The Gaady Awards are like the Emmy Awards for the field of digital accessibility. And, just like the Emmys, the Gaadys are a reason to celebrate! On November 21, GitHub was honored to roll out the red carpet for the accessibility community at our San Francisco headquarters.