A Note on Today’s Outage
We had an outage this morning from 06:32 to 07:42 PDT. One of the file servers experienced an unusually high load that caused the heartbeat monitor on that file server…
We had an outage this morning from 06:32 to 07:42 PDT. One of the file servers experienced an unusually high load that caused the heartbeat monitor on that file server pair to behave abnormally and confuse the dynamic hostname that points to the active file server in the pair. This in turn caused the frontends to start timing out and resulted in their removal from the load balancer. Here is what we intend to do to prevent this from happening in the future:
- The slave file servers are still in standby mode from the migration. We will have a maintenance window tonight at 22:00 PDT in order to ensure that slaves are ready to take over as master should the existing masters exhibit this kind of behavior.
- To identify the root cause of the load spikes we will be enabling process accounting on the file servers so that we may inspect what processes are causing the high load.
- As a related item, the site still gives a “connection refused” error when all the frontends are out of load balancer rotation. We are working on determining why the placeholder site that should be shown during this type of outage is not being brought up.
- We’ve also identified a problem with the single unix domain socket upstream approach in Nginx. By default, any upstream failures cause Nginx to consider that upstream defunct and remove it from service for a short period. With only a single upstream, this obviously presents a problem. We are testing a change to the configuration that should make Nginx always try upstreams.
We apologize for the downtime and any inconvenience it may have caused. Thank you for your patience and understanding as we continue to refine our Rackspace setup and deal with unanticipated events.
Written by
Related posts
Pick your agent: Use Claude and Codex on Agent HQ
Claude by Anthropic and OpenAI Codex are now available in public preview on GitHub and VS Code with a Copilot Pro+ or Copilot Enterprise subscription. Here’s what you need to know and how to get started today.
What the fastest-growing tools reveal about how software is being built
What languages are growing fastest, and why? What about the projects that people are interested in the most? Where are new developers cutting their teeth? Let’s take a look at Octoverse data to find out.
Year recap and future goals for the GitHub Innovation Graph
Discover the latest trends and insights on public software development activity on GitHub with data from the Innovation Graph through Q3 2025.