GitHub Availability Report: October 2023
In October, we experienced two incidents that resulted in degraded performance across GitHub services.
In October, we experienced two incidents that resulted in degraded performance across GitHub services.
October 17 10:59 UTC (lasting 2 hours and 49 minutes)
From 10:59 UTC to 13:48 UTC on October 17, GitHub Codespaces service was degraded due to an outage in authentication. This issue impacted 67% of users over this time period, with users seeing failures to create and start their Codespaces. The regional authentication layer experienced throttling with a global third-party dependency due to increased load from onboarding a new Codespaces region. The Codespaces team mitigated manually by reducing load on the external dependency. Following the incident, the Codespaces team is actively evaluating and implementing scaling improvements to make the service more resilient to increasing demands. These include implementing regional-level caching to minimize calls to the dependency and incorporating measures to ensure the continued health of the authentication service in the event of errors.
October 25 09:13 UTC (lasting 3 hours and 27 minutes cumulatively)
On October 25 through 26, GitHub Copilot experienced multiple short and partial outages which affected code completions.
GitHub Copilot completions are currently hosted in multiple regions globally. Users are typically routed to the nearest geographic region, but may be routed to other regions when the nearest region is unhealthy. Beginning at 09:13 UTC on October 25, GitHub Copilot began experiencing partial outages of individual regions, lasting approximately 12 minutes per region. These outages were due to the nodes hosting the completion model being upgraded by an automated process, and a subset of GitHub Copilot users experienced completion errors during this timeframe. The issue was fully resolved at 02:40 UTC on October 26.
In order to prevent similar outages from happening in the future, we have taken steps to disable the automated upgrade behavior that we identified as the root cause, as well as prioritizing improvements to our global load balancing during regional outages.
Please follow our status page for real-time updates on status changes. To learn more about what we’re working on, check out the GitHub Engineering Blog.
Tags:
Written by
Related posts
GitHub Availability Report: August 2024
In August, we experienced one incident that resulted in degraded performance across GitHub services.
Fine-tuned models are now in limited public beta for GitHub Copilot Enterprise
Fine-tuned models empower organizations to receive code suggestions specifically tailored to their coding practices and internal languages.
2024 is the biggest global election year in history. What’s at stake for developers?
GitHub is considering what is at stake for our users and platform, how we can take responsible action to support free and fair elections, and how developers contribute to resilient democratic processes.