Today’s Downtime
Earlier today, we received an alert from our monitoring system that a disk in our active MySQL server’s RAID10 array had started to clock up some media errors. Being a…
Earlier today, we received an alert from our monitoring system that a disk in our active MySQL server’s RAID10 array had started to clock up some media errors. Being a non-critical fault at this stage, we decided to hold off pulling the disk until after the work day had ended. Just before 7:30PM PST tonight, we logged into the machine and marked the disk as failed in the RAID controller in preparation for the disk replacement.
Due to some very unfortunate timing, the command to remove the disk from the array happened to lock the kernel for a few moments just as our HA system was performing a health check on the active database server. This caused our HA system to mark the active database server as problematic and it stepped in to fix the situation the only way it knows how: power down the affected machine and bring MySQL up on the standby machine. This process actually went very smoothly, however due to the size of some of our tables, the InnoDB recovery process after the unclean shutdown of MySQL took the better part of 20 minutes to run.
Full services were restored by 7:45PM PST.
Now that we are aware of the issue, we have adjusted our procedure to take this into account to ensure that it doesn’t happen again.
Written by
Related posts

Explore the best of GitHub Universe: 9 spaces built to spark creativity, connection, and joy
See what’s happening at Universe 2025, from experimental dev tools and career coaching to community-powered spaces. Save $400 on your pass with Early Bird pricing.

Agents panel: Launch Copilot coding agent tasks anywhere on GitHub
Delegate coding tasks to Copilot and track progress wherever you are on GitHub. Copilot works in the background, creates a pull request, and tags you for review when finished.

Q1 2025 Innovation Graph update: Bar chart races, data visualization on the rise, and key research
Discover the latest trends and insights on public software development activity on GitHub with the quarterly release of data for the Innovation Graph, updated through March 2025.