Context aware MySQL pools via HAProxy
At GitHub we use MySQL as our main datastore. While repository data lies in git, metadata is stored in MySQL. This includes Issues, Pull Requests, Comments etc. We also auth…
At GitHub we use MySQL as our main datastore. While repository data lies in git
, metadata is stored in MySQL. This includes Issues, Pull Requests, Comments etc. We also auth against MySQL via a custom git proxy (babeld). To be able to serve under the high load GitHub operates at, we use MySQL replication to scale out read load.
We have different clusters to provide with different types of services, but the single-writer-multiple-readers design applies to them all. Depending on growth of traffic, on application demand, on operational tasks or other constraints, we take replicas in or out of our pools. Depending on workloads some replicas may lag more than others.
Displaying up-to-date data is important. We have tooling that helps us ensure we keep replication lag at a minimum, and typically it doesn’t exceed 1
second. However sometimes lags do happen, and when they do, we want to put aside those lagging replicas, let them catch their breath, and avoid sending traffic their way until they are caught up.
We set out to create a self-managing topology that will exclude lagging replicas automatically, handle disasters gracefully, and yet allow for complete human control and visibility.
HAProxy for load balancing replicas
We use HAProxy for various tasks at GitHub. Among others, we use it to load balance our MySQL replicas. Our applications connect to HAProxy servers at :3306
and are routed to replicas that can serve read requests. Exactly what makes a replica able to “serve read requests” is the topic of this post.
MySQL load balancing via HAProxy is commonly used, but we wanted to tackle a few operational and availability concerns:
- Can we automate exclusion and inclusion of backend servers based on replication status?
- Can we automate exclusion and inclusion of backend servers based on server role?
- How can we react to a scenario where too many servers are excluded, and we are only left with one or two “good” replicas?
- Can we always serve?
- How easy would it be to override pool membership manually?
- Will our solution survive a
service haproxy reload/restart
?
With this criteria in mind, the standard mysql-check
commonly used in HAProxy-MySQL load balancing will not suffice.
This simple check merely tests whether a MySQL server is live and doesn’t gain additional insight as for its internal replication state (lag/broken) or for its operational state (maintenance/ETL/backup jobs etc.).
Instead, we make our HAProxy pools context aware. We let the backend MySQL hosts make an informed decision: “should I be included in a pool or should I not?”
Context aware MySQL pools
In its very simplistic form, context awareness begins with asking the MySQL backend replica: “are you lagging?” We will reach far beyond that, but let’s begin by describing this commonly used setup.
In this situation, HAProxy no longer uses a mysql-check
but rather an http-check
. The MySQL backend server provides an HTTP
interface, responding with HTTP 200
or HTTP 503
depending on replication lag. HAProxy will interpret these as “good” (UP
) or “bad” (DOWN
), respectively. On the HAProxy side, it looks like this:
backend mysql_ro_main
option httpchk GET /
balance roundrobin
retries 1
timeout connect 1000
timeout check 300
timeout server 86400000
default-server port 9876 fall 2 inter 5000 rise 1 downinter 5000 on-marked-down shutdown-sessions weight 10
server my-db-0001 my-db-0001.heliumcarbon.com:3306 check
server my-db-0002 my-db-0002.heliumcarbon.com:3306 check
server my-db-0003 my-db-0003.heliumcarbon.com:3306 check
server my-db-0004 my-db-0004.heliumcarbon.com:3306 check
server my-db-0005 my-db-0005.heliumcarbon.com:3306 check
server my-db-0006 my-db-0006.heliumcarbon.com:3306 check
The backend servers need to provide an HTTP service on :9876
. That service would connect to MySQL, check for replication lag, and return with 200
(say, lag <= 5s
) or 503
(lag > 5s
or replication is broken).
Some reflections
This commonly used setup automatically excludes or includes backend servers based on replication status. If the server is lagging, the specialized HTTP service will report 503
, which HAProxy will interpret as DOWN
, and the server will not serve traffic until it recovers.
But, what happens when two, three, or four replicas are lagging? We are left with less and less serving capacity. The remaining replicas are receiving two or three times more traffic than they’re used to receiving. If this happens, the replicas might succumb to the load and lag as well, and the solution above might not be able to handle an entire fleet of lagging replicas.
What’s more, some of our replicas have special roles. Each cluster has a node running continuous logical or physical backups. For example, other nodes might be serving a purely analytical workload or be partially weighted to verify a newer MySQL version.
In the past, we would update the HAProxy config file with the list of servers as they came and went. As we grew in volume and in number of servers this became an operational overhead. We’d rather take a more dynamic approach that provides increased flexibility.
We may operate a MySQL master failover. This may be a planned operation (e.g. upgrading to latest release) or an unplanned one (e.g. automated failover on hardware failure). The new master must be excluded from the read-pool. The old master, if available, may now serve reads. Again, we wish to avoid the need to update HAProxy’s configuration with these changes.
Static HAProxy configuration, dynamic decisions
In our current setup the HAProxy configuration does not regularly change. It may change when we introduce new hardwares now and then, but otherwise it is static, and HAProxy reacts to ongoing instructions by the backend servers telling it:
- I’m good to participate in a pool (
HTTP 200
) - I’m in bad state; don’t send traffic my way (
HTTP 503
) - I’m in maintenance mode. No error on my side, but don’t send traffic my way (
HTTP 404
)
The HAProxy config file lists each and every known server. The list includes the backup server. It includes the analytics server. It even includes the master. And the backend servers tell HAProxy whether they wish to participate in taking read traffic or not.
The HAProxy config file lists each and every known server. The list includes the backup server, the analytics server, and even the master. The backend servers themselves tell HAProxy whether they wish to participate in taking read traffic or not.
Before showing you how to implement this, let’s consider availability.
Graceful failover of pools
HAProxy supports multiple backend pools per frontend, and provides with Access Control Lists (ACLs). ACLs often use incoming connection data (headers, cookies etc.) but are also able to observe backend status.
The scheme is to define two (or more) backend pools:
- The first (“main”/”normal”) pool consists of replicas with acceptable lag that are able to serve traffic, as above
- The second (“backup”) pool consists of valid replicas which are allowed to be lagging
We use an acl
that observes the number of available servers in our main
backend. We then set a rule to use the backup
pool if that acl
applies:
frontend mysql_ro
...
acl mysql_not_enough_capacity nbsrv(mysql_ro_main) lt 3
use_backend mysql_ro_backup if mysql_not_enough_capacity
default_backend mysql_ro_main
See code sample
In the example above we choose to switch to the mysql_ro_backup
pool when left with less than three active hosts in our mysql_ro_main
pool. We’d rather serve stale data than stop serving altogether. Of course, by this time our alerting system will have alerted us to the situation and we will already be looking into the source of the problem.
Remember that it’s not HAProxy that makes the decision “who’s in and who’s out” but the backend server itself. To that effect, HAProxy sends a check hint to the server. We choose to send the hint in the form of a URI, as this makes for a readable, clear code:
backend mysql_ro_main
option httpchk GET /check-lag
http-check disable-on-404
balance roundrobin
retries 1
timeout connect 1000
timeout check 300
timeout server 86400000
default-server port 9876 fall 2 inter 5000 rise 1 downinter 5000 on-marked-down shutdown-sessions weight 10
server my-db-0001 my-db-0001.heliumcarbon.com:3306 check
server my-db-0002 my-db-0002.heliumcarbon.com:3306 check
server my-db-0003 my-db-0003.heliumcarbon.com:3306 check
server my-db-0004 my-db-0004.heliumcarbon.com:3306 check
server my-db-0005 my-db-0005.heliumcarbon.com:3306 check
server my-db-0006 my-db-0006.heliumcarbon.com:3306 check
backend mysql_ro_backup
option httpchk GET /ignore-lag
http-check disable-on-404
balance roundrobin
retries 1
timeout connect 1000
timeout check 300
timeout server 86400000
default-server port 9876 fall 2 inter 10000 rise 1 downinter 10000 on-marked-down shutdown-sessions weight 10
server my-db-0001 my-db-0001.heliumcarbon.com:3306 check
server my-db-0002 my-db-0002.heliumcarbon.com:3306 check
server my-db-0003 my-db-0003.heliumcarbon.com:3306 check
server my-db-0004 my-db-0004.heliumcarbon.com:3306 check
server my-db-0005 my-db-0005.heliumcarbon.com:3306 check
server my-db-0006 my-db-0006.heliumcarbon.com:3306 check
See code sample
Both backend pools list the exact same servers. The major difference between the pools is the check
URI:
option httpchk GET /check-lag
vs.
option httpchk GET /ignore-lag
As the URI suggests, the first, main
pool is looking for backup servers that do not lag (and we wish to also exclude the master, the backup server, etc.). The backup
pool is happy to take servers that actually do lag. But still, it wishes to exclude the master and other special servers.
HAProxy’s behavior is to use the main
pool for as long as at least three replicas are happy to serve data. If only two replicas or less are in good shape, HAProxy switches to the backup
pool, where we re-introduce the lagging replicas; serving more stale data, but still serving.
Also noteworthy in the above is http-check disable-on-404
, which puts a HTTP 404
server in a NOLB
state. We will discuss this in more detail soon.
Implementing the HTTP service
Any HTTP
service implementation will do. At GitHub, we commonly use shell
and Ruby
scripts, that integrate well with our ChatOps. We have many reliable shell
building blocks, and our current solution is a shell
oriented service, in the form of xinetd
.
xinetd
makes it easy to “speak HTTP” via shell
. A simplified setup looks like this:
In the above, we’re in particular interested that xinetd
serves on :9876
and calls upon /path/to/scipts/xinetd-mysql
to respond to HAPRoxy’s check
requests.
Implementing the check script
The xinetd-mysql script routes the request to an appropriate handler. Recall that we asked HAProxy to hint per check
. The hint URI, such as /check-lag
, is intercepted by xinetd-mysql
which further invokes a dedicated handler for this check. Thus, we have different handlers for /check-lag
, /ignore-lag
, /ignore-lag-and-yes-please-allow-backup-servers-as-well
etc.
The real magic happens when running this handler script. This is where the server makes the decision: “Should I be included in the read-pool or not?” The script bases its decision on the following factors:
- Did a human suggest that this server be explicitly included/excluded?
This is just a matter of touching a file - Is this the master server? A backup server? Something else?
The server happens to know its own role via service discovery or even viapuppet
. We check for a hint file - Is MySQL lagging? Is it alive at all?
This (finally) executes a self-check on MySQL.
For lag we use a heartbeat mechanism, but your mileage may vary.
This xinetd
/shell
implementation suggests we do not use persistent MySQL connections; each check
generates a new connection on the backend server. While this seems wasteful, the rate of incoming check requests is not high, and negligible in the scale of our busy servers. But, furthermore, this better serves our trust in the system: a hogged server may be able to serve existing connections but refuse new ones; we’re happy to catch this scenario.
404 or error?
Servers that just don’t want to participate send a 404
, causing them to go NOLB
. Lagging, broken or dead replicas send a 503
. This makes it easier on our alerting system and makes it clearer when we have a problem.
One outstanding issue is that HAProxy never transitions from DOWN
to NOLB
. The automaton requires first going UP
. This is not an integrity problem but causes more alerting. We work around this by cross checking servers and refreshing if need be. This is a rare situation for us and thus of no significant concern.
Operations
This small building blocks design permits us to do simple unit testing. Control and visibility are easily gained: disabling and enabling servers is a matter of creating a file. Whether forced to exist by a human or implied by server role.
These scripts integrate well within our chatops. We are able to see the exact response HAProxy sees via simple chatops commands:
Or we can interfere and force backends in/out the pools:
We have specialized monitoring for these HAProxy boxes, but we don’t wish to be notified if a single replica starts to lag. Rather, we’re interested in the bigger picture: a summary of the total found errors in the pools. This means there’s a difference between a half empty main
pool and a completely empty one. In the event of problems, we get a single alert that summarizes the status across the cluster’s pools. As always, we can also check from chatops:
We’ve stripped our script and config files to decouple them from GitHub’s specific setup and flow. We’ve also open sourced them in the hope that you’ll find them useful, and that they’ll help you implement your own solution with context-aware MySQL replica pools.
Written by
Related posts
Breaking down CPU speed: How utilization impacts performance
The Performance Engineering team at GitHub assessed how CPU performance degrades as utilization increases and how this relates to capacity.
How to make Storybook Interactions respect user motion preferences
With this custom addon, you can ensure your workplace remains accessible to users with motion sensitivities while benefiting from Storybook’s Interactions.
GitHub Enterprise Cloud with data residency: How we built the next evolution of GitHub Enterprise using GitHub
How we used GitHub to build GitHub Enterprise Cloud with data residency.