State of the Hub: Rackspace day 2

Monday has come and gone, no one is happier about this than I am. As day 2 is wrapping up, I’d like to post an update for those of you…

|
| 3 minutes

Monday has come and gone, no one is happier about this than I am. As day 2 is wrapping up, I’d like to post an update for those of you that have been following us.

Before I get to the details, I’d like to remind everyone to open tickets on Tender if you have any issues. For those of you that have open tickets never fear, we’re burning through tickets as fast as we can. When I logged in this morning we had over 120 tickets in the inbox, right now we’re sitting at 35. Unfortunately Tender doesn’t give me stats on the number of tickets opened today, but we’ve dealt with at least 85 plus however many new tickets were opened during the day. I’d bet that’s a few dozen at least. To put that in perspective, before the move on any given stable day we saw maybe a dozen tickets, and on unstable days anywhere from 20 to 40 new tickets on average.

Bugs squashed today

Everyone was busy all day today killing bugs. At one point I estimate mojombo was handling up to 800 problems a minute. Here’s a rundown of the major bug fixes:

  • 502 Bad Gateway errors should, for the most part, be gone. These were hitting gist creation, applying commits in the fork queue and user creation the worst. If you run into this error please open a ticket and detail what you were doing when you encountered it.
  • Repo under migration errors should also be gone. This included a patch to ensure user paths were generated if they didn’t exist on lookup and a few batch jobs to force repos that had not been created on disk to generate. If you have any new, unpushed repos throwing errors please let us know.
  • Errors when pushing repos with submodules should be gone now.
  • A well placed kick to the mail server got outgoing emails working again.
  • User deletion should work again, I have not personally tested this one because I can’t bear to see users go.
  • New Pages were not being set up, existing pages were not building on push, and pygments was missing on the Pages server. All of these should be fixed now. If your page has not built, make a new commit to trigger a new build. If you new page hasn’t been set up on the server, open a ticket so we can force it to run.

Outstanding issues

  • Gist forking still broken.
  • Zip/tar downloads are failing to generate for a few repos. Please open a ticket and let us know what repo so we can pin this one down.
  • Connections to ssh.github.com:443 are still not working. We’ve already applied one fix here but it did not work. We are still working on this, firewalled folks, all I can ask is that you be patient.
  • Some gists made shortly before the lockdown on Sunday did not sync to rackspace.
  • Gem building is still disabled. We have no ETA on this one, but we’re not afraid to shamelessly plug gemcutter in the mean time.

and now for something completely different

Since the move, we’ve seen over 1000 new users and 2000 new repos. We’ve processed over 800k background jobs, and the background job queues are blazing fast on the new servers. Before the move our low-priority queue (network graph updates, http cloning updates and some other jobs) was backed up for a few weeks. At peak times it would rise to 40k jobs or more. Graph updates were often delayed many hours. We had 25 job runners working nearly full time on the low prio queue. On the new servers we have 40 workers, however they are sitting idle most of the time. Usually only 2 or 3 are active at any given time. The jobs queues, including the low prio queue, have stayed near zero since we cut over the DNS.

Written by

Related posts