How the Grafana Alerting team scales their issue management with GitHub Projects
Hear from Grafana Labs' Armand Grillet about how his team uses GitHub Projects.
Better late than never, right? As we get ready to upgrade our servers I thought it'd be a good time to upgrade our deployment process. Currently pushing out a new…
Better late than never, right? As we get ready to upgrade our servers I thought it’d be a good time to upgrade our deployment process. Currently pushing out a new version of GitHub takes upwards of 15 minutes. Ouch. My goal: one minute deploys (excluding server restart time).
We currently use Capistrano with a 400 line deploy.rb file. Engine Yard provides a handful of useful Cap tasks (in gem form) that we use along with many of the built-in features. We also use the fast_remote_cache deployment strategy and have written a handful (400 lines or so) of our own tasks to manage things like our service hooks or SVN importer.
As you may know, Capistrano keeps a releases directory where it creates timestamped versions of your app. All your daemons and processes then assume your app lives under a directory called current which is actually a symlink to the latest timestamped version of your app in releases. When you deploy a new version of your app, it’s put into a new timestamped directory under releases. After all the heavy lifting is done the current symlink is switched to it.
Which was really great. Before Git. So I went digging.
First I investigated Vlad the Deployer, the Capistrano alternative in Ruby. I like that it’s built on Rake but it seems to make the same assumptions as Capistrano. Basically both of these tools are modular and built in such a way that they work the same whether you’re using Subversion, Perforce, or Git. Which is great if you’re using SVN but unfortunate if you’re using Git.
For example, this is from Vlad’s included Git deployment strategy:
When you deploy a new copy of your app, Vlad removes the existing copy and does a full clone to get a new version. Capistrano does something similar by default but has a bundled “remote_cache” strategy that is a bit smarter: it caches the Git repo and does a fetch then a reset. It still has to then copy the updated version of your app into a timestamped directory and switch the symlink, but it’s able to cut down on time spent pulling redundant objects. It even knows about the depth option.
The next thing I looked at was Heroku’s rush. It lets you drive servers (even clusters of them) using Ruby over SSH, which looked very promising. Maybe I’d write a little git-deploy script based on it.
Unfortunately for me Rush needs to be installed on every server you’re managing. It also needs a running instance of rushd. Which makes sense – it’s a super powerful library – but that wouldn’t work for deploying GitHub.
Fabric is a library I first heard about back in February. It’s like Capistrano or Vlad but with more emphasis on being a framework/tool for remote management of servers. Easy deployment scripts are just a side effect of that mentality.
It’s very powerful and after playing with it for a while I was extremely pleased. I’ll definitely be using it in all my Python projects. However, I wasn’t looking forward to porting all our custom Capistrano tasks to Python. Also, though I love Python, we’re mostly a Ruby shop and everyone needs to be able to add, debug, and modify our deploy scripts with ease.
Playing with Fabric did inspire me, though. Capistrano is basically a tool for remote server management, too, if you think about it. We may have outgrown its ideas about deployment but I can always write my own deployment code using Capistrano’s ssh and clustering capabilities. So I did.
It turned out to be pretty easy. First I created a config/deploy directory and started splitting up the deploy.rb into smaller chunks:
$ ls -1 config/deploy gem_eval.rb import.rb notify.rb queue.rb services.rb settings.rb sudo_everywhere.rb symlinks.rb
Then I pulled them in. Careful here: Capistrano override both load and require so it’s probably best to just use load.
This separation kept the deploy.rb and each specific file small and focused.
Next I thought about how I’d do Git-based deployment. Not too different from Capistrano’s remote_cache, really. Just get rid of all the timestamp directories and have the current directory contain our clone of the Git repo. Do a fetch then reset to deploy. Rollback? No problem.
The best part is that because Engine Yard’s gemified tasks and our own code both call standard Capistrano tasks like deploy and deploy:update, we can just replace them and not change the dependent code.
Here’s what our new deploy.rb looks like. Well, the meat of it at least:
Great. I like this – very Gitty and simple. But copying and removing directories wasn’t the only slow part of our deploy process.
Every Capistrano task you run adds a bit of overhead. I don’t know exactly why, but I imagine each task opens a fresh SSH connection to the necessary servers. Maybe. Either way, the less tasks you run the better.
We were running about eight symlink related tasks during each deploy. Config files and cache directories that only live on the server need to be symlinked into the app’s directory structure after the reset. Cutting these actions down to a single task made everything much, much faster.
Here’s our symlinks.rb:
Solution? Do the minimizing and bundling on the servers. The beefy, beefy servers:
As long as the bundle Rake tasks don’t need to load the Rails environment (which ours don’t), this is much faster.
We moved to a more Git-like deployment setup, cut down the number of tasks we run, and moved bundling and minimizing JS and CSS from our localhost to the server. Did it help?
As I said before, a GitHub deploy can take 15 minutes (not counting server restarts). My goal was to drop it down to 1 minute. How’d we do?
$ time cap production deploy * executing `production' * executing `deploy' triggering before callbacks for `deploy:update' * executing `notify:campfire' * executing `deploy:update' * executing `deploy:update_code' triggering after callbacks for `deploy:update_code' * executing `symlinks:make' * executing `deploy:bundle' * executing `deploy:restart' * executing `mongrel:restart' * executing `deploy:cleanup' real 0m14.361s user 0m2.049s sys 0m0.560s
15 minutes down to 14 seconds. Not bad.