Surviving the SSHpocolypse
Over the past few days, we have had some issues with our SSH infrastructure affecting a small number of Git SSH operations. We apologize for the inconvenience, and are happy…
Over the past few days, we have had some issues with our SSH infrastructure affecting a small number of Git SSH operations. We apologize for the inconvenience, and are happy to report that we’ve completed one round of architectural changes in order to make sure our SSH servers keep their sparkle.
As we’ve said before, we use GitHub to build GitHub, so the recent intermittent SSH connection failures have been affecting us as well.
Before today, every Git operation over SSH would open its own connection to our MySQL database during the authentication step. In the past this wasn’t a problem, however, we’ve started seeing sporadic issues as our SSH traffic has grown.
Realizing we were potentially on the cusp of a more serious situation, we patched our SSH servers to increase timeouts, retry connections to the database, and verbosely log failures. After this initial pass of incremental changes aimed to pinpoint the source of the problem, we realized this piece of our infrastructure wasn’t as easily modified as we would have liked. We decided to take a more drastic approach.
Starting on Tuesday, I worked with @jnewland to retire our 4+ year-old SSH patches and rewrite them all from scratch. Rather than opening a database connection for each SSH client, we call out to a shared library plugin (written in C) that lives in our Rails app. The library uses an HTTP endpoint exposed by our Rails app in order to check for authorized public keys. The Rails app is backed by a web server with persistent database connections, which keeps us from creating unbounded database connections, as we were doing previously. This is pretty neat because, like all code that lives in the GitHub Rails app, we can redeploy it near-instantly at any time. This gives us tremendous flexibility in continuing to scale our SSH services.
@jnewland deployed the changes around 9:20am Thursday and things seem to be in much better shape now. Below is a graph that shows connections to the mysql database. You can see a drastic reduction in the number of database connections:
You can also observe an overall smaller number of SSH server processes (they’re not all stuck because of contention on the database server anymore):
Of course, we are also exploring additional scalability improvements in this area.
Anywho, sorry for the mess. As always, please ping our support team if you see any further issues on github.com where Git over SSH hangs up randomly.
Written by
Related posts
Students: Start building your skills with the GitHub Foundations certification
The GitHub Foundations Certification exam fee is now waived for all students verified through GitHub Education.
Announcing GitHub Secure Open Source Fund: Help secure the open source ecosystem for everyone
Applications for the new GitHub Secure Open Source Fund are now open! Applications will be reviewed on a rolling basis until they close on January 7 at 11:59 pm PT. Programming and funding will begin in early 2025.
Software is a team sport: Building the future of software development together
Microsoft and GitHub are committed to empowering developers around the world to innovate, collaborate, and create solutions that’ll shape the next generation of technology.