Vulcanizer: a library for operating Elasticsearch

Image of GitHub Engineering

At GitHub, we use Elasticsearch as the main technology backing our search services. In order to administer our clusters, we use ChatOps via Hubot. As of 2017, those commands were a collection of Bash and Ruby-based scripts.

Although this served our needs for a time, it was becoming increasingly apparent that these scripts lacked composability and reusability. It was also difficult to contribute back to the community by open sourcing any of these scripts due to the fact they are specific to bespoke GitHub infrastructure.

Why build something new?

There are plenty of excellent Elasticsearch libraries, both official and community driven. For Ruby, GitHub has already released the Elastomer library and for Go we make use of the Elastic library by user olivere. However, these libraries focus primarily on indexing and querying data. This is exactly what an application needs to use Elasticsearch, but it’s not the same set of tools that operators of an Elasticsearch cluster need. We wanted a high-level API that corresponded to the common operations we took on a cluster, such as disabling allocation or draining the shards from a node. Our goal was a library that focused on these administrative operations and that our existing tooling could easily use.

Full speed ahead with Go…

We started looking into Go and were inspired by GitHub’s success with freno and orchestrator.

Go’s structure encourages the construction of composable (self-contained, stateless, components that can be selected and assembled) software, and we saw it as a good fit for this application.

… Into a wall

We initially scoped the project out to be a packaged chat app and planned to open source only what we were using internally. During implementation, however, we ran into a few problems:

  • GitHub uses a simple protocol based on JSON-RPC over HTTPS called ChatOps RPC. However, ChatOps RPC is not widely adopted outside of GitHub. This would make integration of our application into ChatOps infrastructure difficult for most parties.
  • The internal REST library our ChatOps commands relied on was not open sourced. Some of the dependencies of this REST library would also need to be open sourced. We’ve started the process of open sourcing this library and its dependencies, but it will take some time.
  • We relied on Consul for service discovery, which not everyone uses.

Based on these factors we decided to break out the core of our library into a separate package that we could open source. This would decouple the package from our internal libraries, Consul, and ChatOps RPC.

The package would only have a few goals:

  • Access the REST endpoints on a single host.
  • Perform an action.
  • Provide results of the action.

This module could then be open sourced without being tied to our internal infrastructure, so that anyone could use it with the ChatOps infrastructure, service discovery, or tooling they choose.

To that end, we wrote vulcanizer.

Vulcanizer

Vulcanizer is a Go library for interacting with an Elasticsearch cluster. It is not meant to be a full-fledged Elasticsearch client. Its goal is to provide a high-level API to help with common tasks that are associated with operating an Elasticsearch cluster such as querying health status of the cluster, migrating data off of nodes, updating cluster settings, and more.

Examples of the Go API

Elasticsearch is great in that almost all things you’d want to accomplish can be done via its HTTP interface, but you don’t want to write JSON by hand, especially during an incident. Below are a few examples of how we use Vulcanizer for common tasks and the equivalent curl commands. The Go examples are simplified and don’t show error handling.

Getting nodes of a cluster

You’ll often want to list the nodes in your cluster to pick out a specific node or to see how many nodes of each type you have in the cluster.

$ curl localhost:9200/_cat/nodes?h=master,role,name,ip,id,jdk
- mdi vulcanizer-node-123 172.0.0.1 xGIs 1.8.0_191
* mdi vulcanizer-node-456 172.0.0.2 RCVG 1.8.0_191

Vulcanizer exposes typed structs for these types of objects.

v := vulcanizer.NewClient("localhost", 9200)

nodes, err := v.GetNodes()

fmt.Printf("Node information: %#v\n", nodes[0])
// Node information: vulcanizer.Node{Name:"vulcanizer-node-123", Ip:"172.0.0.1", Id:"xGIs", Role:"mdi", Master:"-", Jdk:"1.8.0_191"}

Update the max recovery cluster setting

The index recovery speed is a common setting to update when you want balance time to recovery and I/O pressure across your cluster. The curl version has a lot of JSON to write.

$ curl -XPUT localhost:9200/_cluster/settings -d '{ "transient": { "indices.recovery.max_bytes_per_sec": "1000mb" } }'
{
"acknowledged": true,
"persistent": {},
"transient": {
"indices": {
"recovery": {
"max_bytes_per_sec": "1000mb"
}
}
}
}

The Vulcanizer API is fairly simple and will also retrieve and return any existing setting for that key so that you can record the previous value.

v := vulcanizer.NewClient("localhost", 9200)
oldSetting, newSetting, err := v.SetSetting("indices.recovery.max_bytes_per_sec", "1000mb")
// "50mb", "1000mb", nil

Move shards on to and off of a node

To safely update a node, you can set allocation rules so that data is migrated off a specific node. In the Elasticsearch settings, this is a comma-separated list of node names, so you’ll need to be careful not to overwrite an existing value when updating it.

$ curl -XPUT localhost:9200/_cluster/settings -d '
{
"transient" : {
"cluster.routing.allocation.exclude._name" : "vulcanizer-node-123,vulcanizer-node-456"
}
}'

The Vulcanizer API will safely add or remove nodes from the exclude settings so that shards won’t allocate on to a node unexpectedly.

v := vulcanizer.NewClient("localhost", 9200)

// Existing exclusion settings:
// vulcanizer-node-123,vulcanizer-node-456

exclusionSetttings1, err := v.DrainServer("vulcanizer-node-789")
// vulcanizer-node-123,vulcanizer-node-456,vulcanizer-node-789

exclusionSetttings2, err := v.FillOneServer("vulcanizer-node-456")
// vulcanizer-node-123,vulcanizer-node-789

Command-line application

Included is a small CLI application that leverages the library:

$ vulcanizer -h
Usage:
  vulcanizer [command]

Available Commands:
  allocation  Set shard allocation on the cluster.
  drain       Drain a server or see what servers are draining.
  fill        Fill servers with data, removing shard allocation exclusion rules.
  health      Display the health of the cluster.
  help        Help about any command
  indices     Display the indices of the cluster.
  nodes       Display the nodes of the cluster.
  setting     Interact with cluster settings.
  settings    Display all the settings of the cluster.
  snapshot    Interact with a specific snapshot.
  snapshots   Display the snapshots of the cluster.

Flags:
  -c, --cluster string      Cluster to connect to defined in config file
  -f, --configFile string   Configuration file to read in (default to "~/.vulcanizer.yaml")
  -h, --help                help for vulcanizer
      --host string         Host to connect to (default "localhost")
  -p, --port int            Port to connect to (default 9200)

Use "vulcanizer [command] --help" for more information about a command.

ChatOps

ChatOps is important for GitHub and our geographically distributed workforce. Vulcanizer enables us to build ChatOps tooling around Elasticsearch quickly and easily for common tasks:

jessbreckenridge
jessbreckenridge

.vulcan-go health testbed
hubot
Hubot

The cluster is green: All primary and replica shards are allocated. This does NOT mean the cluster is otherwise healthy.
+--------+------------+------+------------+----------+
| STATUS | RELOCATING | INIT | UNASSIGNED | ACTIVE % |
+--------+------------+------+------------+----------+
| green  | 0          | 0    | 0          | 100      |
+--------+------------+------+------------+----------+
jessbreckenridge
jessbreckenridge

.vulcan-go nodes testbed
hubot
Hubot

testbed nodes
+--------+------+------------------------+-----------+-----------+
| MASTER | ROLE |          NAME          |    JDK    |      IP   |
+--------+------+------------------------+-----------+-----------+
| *      | mi   | search-arbiter-aaaaaaa | 1.8.0_111 | 172.0.0.1 |
| -      | mi   | search-arbiter-bbbbbbb | 1.8.0_111 | 172.0.0.2 |
| -      | mi   | search-arbiter-ccccccc | 1.8.0_111 | 172.0.0.3 |
| -      | di   | search-storage-ddddddd | 1.8.0_111 | 172.0.0.4 |
| -      | di   | search-storage-eeeeeee | 1.8.0_111 | 172.0.0.5 |
| -      | di   | search-storage-eeeeeee | 1.8.0_111 | 172.0.0.6 |
+--------+------+------------------------+-----------+-----------+
nickcanz
nickcanz

.vulcan-go drain status testbed
hubot
Hubot

IP's filtered: None
Name Excluded: None
Host Excluded: None
nickcanz
nickcanz

.vulcan-go drain server search-storage-zzzzzzz testbed
hubot
Hubot

Server "search-storage-zzzzzzz" added to list of servers to drain.

Complete list servers draining:
    Ips: None
    Names: search-storage-zzzzzzz
    Hosts: None
nickcanz
nickcanz

.vulcan-go settings testbed
hubot
Hubot

testbed settings
No persistent settings are set.
+------------------------------------------+------------------------+
|            TRANSIENT SETTINGS            |         VALUE          |
+------------------------------------------+------------------------+
| cluster.routing.allocation.exclude._name | search-storage-zzzzzzz |
+------------------------------------------+------------------------+

Closing

We stumbled a bit when we first started down this path, but the end result is best for everyone:

  • Since we had to regroup about what exact functionality we wanted to open source, we made sure we were providing value to ourselves and the community instead of just shipping something.
  • Internal tooling doesn’t always follow engineering best practices like proper release management, so developing Vulcanizer in the open provides an external pressure to make sure we follow all of the best practices.
  • Having all of the Elasticsearch functionality in its own library allows our internal applications to be very slim and isolated. Our different internal applications have a clear dependency on Vulcanizer instead of having different internal applications depend on each other or worse, trying to get ChatOps to talk to other ChatOps.

Visit the Vulcanizer repository to clone or contribute to the project. We have ideas for future development in the Vulcanizer roadmap.

Authors


Catch early bird pricing for GitHub Satellite

Sync up with us and leading developers from around the world in Berlin, May 22-23, and get €100 off regular-priced tickets until April 11.

Get tickets

Join us at Maintainerati

Calling all maintainers: Unite at Maintainerati, a one-day unconference to gather, present, and discuss the day after GitHub Satellite.

RSVP