GitHub Availability Report: August 2023
In August, we experienced two incidents that resulted in degraded performance across GitHub services.

In August, we experienced two incidents that resulted in degraded performance across GitHub services.
August 15 16:58 UTC (lasting 4 hours 29 minutes)
On August 15 at 16:58 UTC, GitHub started experiencing increasing delays in an internal job queue used to process webhooks. We statused GitHub Webhooks to yellow at 17:24 UTC. During this incident, customers experienced webhooks delays as long as 4.5 hours.
We determined that the delays were caused by a significant and sustained spike in webhook deliveries. This caused a backup of our webhooks deliveries queue. We mitigated the issue by blocking events from sources of the increased load, which allowed the system to gradually recover as we processed the backlog of events. In response to this and other recent webhooks incidents, we made improvements that allow us to handle a higher amount of traffic and absorb load spikes without increasing delivery latency. We also improved our ability to manage load sources to prevent and more quickly mitigate any impact to our service.
August 29 02:36 UTC (lasting 49 minutes)
On August 29 at 02:36 UTC, GitHub systems experienced widespread delays in background job processing. This prevented webhook deliveries, GitHub Actions, and other asynchronously-triggered workloads throughout the system from running immediately as normal. While workloads were delayed by up to an hour, no data was lost, and systems ultimately recovered and resumed timely operation.
The component of our job queueing service responsible for dispatching jobs to workers failed due to an interaction with unexpected CPU throttling and short session timeouts for a Kafka consumer group. The Kafka consumer ended up stuck in a loop, unable to stabilize fast enough before timing out and restarting the coordination process. While the service continued to accept and record incoming work, it was unable to pass jobs on to workers until we mitigated the issue by shifting the load to the standby service as well as redeploying the primary service. We have extended our monitoring to allow quicker diagnosis of this failure mode, and are pursuing additional changes to prevent reoccurrence.
Please follow our status page for real-time updates on status changes. To learn more about what we’re working on, check out the GitHub Engineering Blog.
Tags:
Written by
Related posts

Explore the best of GitHub Universe: 9 spaces built to spark creativity, connection, and joy
See what’s happening at Universe 2025, from experimental dev tools and career coaching to community-powered spaces. Save $400 on your pass with Early Bird pricing.

Agents panel: Launch Copilot coding agent tasks anywhere on GitHub
Delegate coding tasks to Copilot and track progress wherever you are on GitHub. Copilot works in the background, creates a pull request, and tags you for review when finished.

Q1 2025 Innovation Graph update: Bar chart races, data visualization on the rise, and key research
Discover the latest trends and insights on public software development activity on GitHub with the quarterly release of data for the Innovation Graph, updated through March 2025.