GitHub Availability Report: August 2025

In August, we experienced three incidents that resulted in degraded performance across GitHub services.

September 11, 2025

| 4 minutes

In August, we experienced three incidents that resulted in degraded performance across GitHub services.

August 5 15:42 UTC (lasting 32 minutes)

At 15:33 UTC on August 5, 2025, we initiated a production database migration to drop a column from a table backing pull request functionality. While the column was no longer in direct use, our ORM continued to reference the dropped column in a subset of pull request queries. As a result, there were elevated error rates across pushes, webhooks, notifications, and pull requests with impact peaking at approximately 4% of all web and REST API traffic.

We mitigated the issue by deploying a change that instructed the ORM to ignore the removed column. Most affected services recovered by 16:13 UTC. However, that fix was applied only to our largest production environment. An update to some of our custom and canary environments did not pick up the fix and this triggered a secondary incident affecting ~0.1% of pull request traffic, which was fully resolved by 19:45 UTC.

While migrations have protections, such as progressive roll-out first targeting validation environments and acknowledge gates, this incident identified an application monitoring gap that would have prevented continued rollout when impact was observed. We will add additional automation and safeguards to prevent future incidents without requiring human intervention. We are also already working on a way to streamline some types of changes across environments, which would have prevented the second incident from occurring.

August 12 13:30 UTC (lasting 3 hours and 44 minutes)

On August 12, 2025, between 13:30 UTC and 17:14 UTC, GitHub search was in a degraded state. Users experienced inaccurate or incomplete results, failures to load certain pages (like issues, pull requests, projects, and deployments) and broken components (like actions workflow and label filters).

Most user impact occurred between 14:00 UTC and 15:30 UTC, when up to 75% of search queries failed, and updates to search results were delayed by up to 100 minutes.

The incident was triggered by intermittent connectivity issues between our load balancers and search hosts. While retry logic initially masked these problems, retry queues eventually overwhelmed the load balancers, causing failure. The query failures were mitigated at 15:30 UTC after throttling our search indexing pipeline to reduce load and stabilize retries. The connectivity failures were resolved at 17:14 UTC after the automated reboot of a search host, causing the rest of the system to recover.

We have improved internal monitors and playbooks, and tuned our search cluster load balancer to further mitigate the recurrence of this failure mode. We’ve identified and resolved a configuration issue in our load balancing tier that was triggering these issues.

August 27 20:35 UTC (lasting 46 minutes)

On August 27, 2025, between 20:35 and 21:17 UTC, Copilot, web, and REST API traffic experienced degraded performance. Copilot saw an average of 36% of requests fail with a peak failure rate of 77%. Approximately 2% of all non-Copilot web and REST API traffic requests failed.

This incident occurred after we initiated a production database migration to drop a column from a table backing copilot functionality. While the column was no longer in direct use, our ORM continued to reference the dropped column. This led to a large number of 5xx responses and was similar to the incident on August 5. At 21:15 UTC, we applied a fix to the production schema, and by 21:17 UTC, all services had fully recovered.

While repairs were in progress to avoid this situation, they were not completed quickly enough to prevent a second incident. We have now implemented a temporary block for all drop column operations as an immediate solution while we add more safeguards to prevent similar issues from occurring in the future. We are also implementing graceful degradation so that Copilot issues will not impact other features of our product.

Please follow our status page for real-time updates on status changes and post-incident recaps. To learn more about what we’re working on, check out the GitHub Engineering Blog.

Written by

Company news

GitHub Availability Report: August 2025

Tags:

Written by

Jakub Oleksy

Related posts

How to navigate GitHub Universe (or any tech conference) if you’re an introvert

GitHub Availability Report: September 2025

The developer role is evolving. Here’s how to stay ahead.

Tags:

Written by

Related posts

How to navigate GitHub Universe (or any tech conference) if you’re an introvert

GitHub Availability Report: September 2025

The developer role is evolving. Here’s how to stay ahead.

We do newsletters, too