Over the past few months I’ve been working on a major new version of Ernie, the RPC server I wrote to power GitHub’s sharded file server architecture. As a reminder, Ernie is an Erlang/Ruby hybrid BERT-RPC server (packaged as a Rubygem) that let’s you expose Ruby modules as an RPC service. It spawns, manages, and load balances between a set of Ruby processes that allow access to the Git repositories.
Over the past four months Ernie has proven to be extremely stable and reliable. Each of our five file servers is handling an average of 50 req/sec (4.3 million req/day), bursting up to 200 req/sec, and transferring over 11GB of data per day (just to the web frontend and jobs; this number does not reflect pushes/pulls/clones/etc.).
Last week I released Ernie 2.0 and upgraded all of our file servers to use it. Last night I released and upgraded everything to 2.1. Here’s a breakdown of what’s new in Ernie 2.0/2.1 and how we’re using these new features to give you an even better GitHub experience.
The biggest new feature in Ernie 2.0 is the ability to define handlers in pure Erlang (instead of just Ruby). These are known as native handlers. Native handlers execute within the Erlang server’s VM and therefor do not have concurrency limits like the Ruby handlers. In addition, the roundtrip to an external process is eliminated, boosting overall performance for those functions. About half of the RPC calls that are issued to Ernie are very simple file existence checks. By implementing these actions in pure Erlang, we’ve reduced the amount of work being done by the Ruby processes and freed them up for other tasks, all of which means our maximum concurrency has increased significantly.
Ernie no longer assumes that your handlers are written in Ruby. You can now use any language to implement your handlers as long as they speak the external handler protocol. Non-Erlang handlers are called external handlers. Currently only Ruby support is included in the distribution but I’ll be adding other languages as the need arises or as contributions come in.
As time goes by, we will be converting more and more Ruby to Erlang to take advantage of the native handlers that Ernie 2.x supports. To make this task as simple as possible, Ernie supports a concept called shadowing. If you define an external handler and a native handler of the same name, Ernie will check the native handler for an exported function of the requested name and use it if it exists. If it does not, it will fall back to the external handler. This feature makes it incredibly simple to migrate functions one at a time to pure Erlang without having to change a single line of client code!
In addition to basic shadowing, you can choose whether to run the native or external version of a specific function based on the arguments. This is called predicate shadowing and is accomplished by returning a boolean from a complementary native function named
myfun is the name of your function. We use this to selectively implement parts of the proxied Grit in pure Erlang.
Requests can now be classified as either high or low priority. Ernie will immediately process any connections marked as high priority. Low priority connections will only be processed if there are no high priority connections pending. We will be using this feature to keep low priority jobs from saturating the file servers with requests that are not time critical, thereby keeping the servers responsive to website requests. While this specific treatment of the high/low queue is rather rudimentary, I plan to include more advanced strategies in a later release. This is really just the groundwork.
Ernie 2.1 introduces a proper access log to make it simple to track what your Ernies are up to. The log file contains the message type (access or error), the time of the initial connection, the number of seconds between connection and when the request is selected for processing, the number of seconds the request took to process, the lengths of the high and low priority queues, the type of handler (native or external), the priority of the request (high or low), and the first 150 bytes of the request.
BERT and BERT-RPC along with our Ruby and Erlang client/server implementations have made it possible for us to build a high performance, sharded file system architecture for a vanishingly small amount of money. We currently have five terabytes of active storage exposed via BERT-RPC and are adding a new file server pair every few months. In the long run, I intend to make Ernie the most robust and flexible RPC server available while preserving the simplicity of writing handler code in the language of your choice. Keep an eye on the project, there are plenty more improvements to come!