Performance at GitHub: deferring stats with rack.after_reply

How we sped up GitHub.com by moving slow, non-critical code into rack.after_reply.

| 4 minutes

Performance is an essential aspect of any production application, and GitHub is no exception. In the last year, we’ve been making significant investments to improve the performance of some of our most critical workflows. There’s a lot to share, but today we’re going to focus on a little-known feature of some Rack-powered web servers called rack.after_reply that saved us 30ms p50 and 50ms+ p99 on every request across the site.

The problem

First, let me talk about how we found ourselves looking at this Rack web server functionality. When diving into performance, we often look at flamegraphs that reveal where time is being spent for a given request. Eventually, we started to notice a pattern. It turned out we were spending a significant amount of time sending statsd metrics throughout a request. In fact, it could be up to 65ms per request! While telemetry enables us to improve GitHub, it doesn’t help users directly, especially when it impacts the performance of our page loads.

There’s no need to send telemetry immediately, especially if it’s expensive, so we started discussing ways to remove telemetry network calls out of the critical path. This resulted in the question, “Can we batch all of our telemetry calls and send them at once?” There’s far too much data to use a background job, so we had to look at other potential solutions. We came across Rack::Events and spiked out a proof of concept. Unfortunately, this approach caused any work performed in Rack::Events callbacks to block the response from closing. This resulted in two issues. First, users would see the browser loading indicator until the deferred code was complete. Second, this impacted our metrics since response times still included the time it took to run our deferred code.

A potential path forward

Recognizing the potential, we kept digging. Eventually, we came across a feature that seemed to do exactly what we wanted, rack.after_reply. The Puma source code describes rack.after_reply as: “A rack extension. If the app writes #call’ables to this array, we will invoke them when the request is done.” In other words, this would allow us to collect telemetry metrics during a request and flush them in a single batch after the browser received the full response, avoiding the issue of network calls slowing down response times. There was a problem though. We don’t use Puma at GitHub, we use Unicorn. 😱

We had a clear path forward. All we needed to do was implement rack.after_reply in Unicorn. This functionality has a lot of use cases outside of sending telemetry metrics, so we contributed rack.after_reply back to Unicorn. This allowed us to write a small wrapper class around rack.after_reply, making it easier to use in the application:

class AfterResponse

  def initialize(env)

    env[“rack.after_reply”] ||= []

    env[“rack.after_reply”] << -> do

      self.call

    end

    @to_perform = []

  end

  # Calls each callable defined via #perform

  def call

    @to_perform.each do |block|

      begin

        block.call(self)

      rescue Object => e

        Rails.logger.error(e)

      end

    end

  end

  # Adds given block to the array of callables that will

  # be called after the user has received the response.

  def perform(name, &block)

    @to_perform << block

  end

end

With the hard part completed, we wrote a small wrapper around our telemetry class to buffer all of our stats data. Combining that with a middleware that performs roughly the following, users now receive responses 30ms faster P50 across all pages. P99 response times improved too, with a 50ms+ reduction across the site.

GitHub.after_response.perform do

  GitHub.statsd.flush!

end

In conclusion

There are a few drawbacks to this approach that are worth mentioning. The primary disadvantage is that a Unicorn worker executes the rack.after_reply code, making it unable to serve another HTTP request until your callables finish running. If not kept in check, this can potentially increase HTTP queueing, the time spent waiting for a web server worker to serve a request. We mitigated this by adding timeout behavior to prevent any code from running too long.

It’s a little early to talk about it, but there is progress toward making this functionality an official part of the Rack spec. A recently released gem also implements additional functionality around this feature called maybe_later that’s worth checking out.

Overall, we’re pleased with the results. Unicorn-powered applications now have a new tool at their disposal, and our customers reap the benefits of this performance enhancement!

Tags:

Related posts