Learn about ghapi, a new third-party Python client for the GitHub API

Image of Hamel Husain

Guest article by Jeremy Howard, founding researcher at fast.ai and lead developer of ghapi and Hamel Husain, Staff Machine Learning Engineer at GitHub and contributor to ghapi.

Summary: Today fast.ai is releasing ghapi, the first Python library and CLI to provide complete, idiomatic access to the entire GitHub API, using a consistent interface. It includes tab-completion, integrated documentation, and many little touches to make your life easier, such as automatic pagination of responses, automatic management of all required headers, query strings, route parameters and post data, and much more. hi

The GitHub API

Like many developers, I spend a lot of time on GitHub. I maintain dozens of repositories and respond to issues and pull requests every day. I use GitHub Actions to automate testing, build software, triage issues, and even to power whole projects, such as fastpages.

That said, there’s lots of things I want to automate or customize for my workflows as a maintainer. For instance, I’d love to be able to see at a glance every issue or PR that someone has replied to or added since I last checked. I want to be able to type a single command to close a bunch of issues with a template response. I want to be able to manage GitHub Actions workflows across all of my repositories with a single command. In general, I prefer using the command line for many things, so just about anything I do through the github.com web interface, I’d like to be able to do at the terminal.

Thankfully, GitHub provides an extensive API that provides programming access to nearly anything that you can do through the GitHub web interface or through the Git client. Furthermore, GitHub Actions allows code to respond to any GitHub event (such as a new pull request coming in or a software release being published). Thanks to the GitHub API, you can do things like:

Unfortunately, fully utilizing these features requires familiarity with multiple languages and technologies and raises some challenges:

  • The GitHub API is accessed through HTTP. Accessing HTTP APIs directly is clunky and requires careful study of every API endpoint. Debugging is difficult, because errors raise obscure HTTP status codes. There are software libraries that make this easier, but each library only handles a subset of the full API, and each one has a whole new interface to learn. Due to the size of the API, libraries generally do not provide full coverage and documentation for all API endpoints.
  • GitHub provides gh, a command line tool that integrates with GitHub, including access to some API functionality. However, this API only covers a small subset of the full API’s functionality.
  • Local development and building extensions to GitHub Actions requires using TypeScript, which many developers aren’t familiar with. Furthermore, Actions are run in a special environment, which can slow iteration and debug cycles.

ghapi

We built ghapi, a Python library and CLI tool, to resolve these challenges. We spent a lot of time using ghapi for our own projects to ensure that the developer experience is consistent, intuitive, and Pythonic. We even used it to help some other developers with their projects, including using it for GitHub CEO, Nat Friedman’s, ghtop project as part of a code overhaul that Nat described as a “tour de force.” The “secret sauce” behind ghapi is GitHub’s new OpenAPI specification, a machine-readable catalog of every part of the GitHub API. We describe this in detail below. But first, let’s see what it looks like to actually use ghapi!

We’ll start by seeing ghapi in action in Python using Jupyter Notebook. You can use any development environment you like. Jupyter is a particularly good choice, because ghapi uses its rich display functionality to make life easier for you.

Let’s say you want to write code to create an issue (for instance, perhaps you are writing something to create an issue whenever continuous integration fails). In order to find the API operation you need, you can search ghapi’s full, categorized API reference. In the issues section, we find issues.create, which shows how to call the method. Alternatively, you can use ghapi’s tab completion to look through the available operations and groups, and even get a link straight to the official GitHub API documentation, as shown in this example:

 

You can also type the name of a group, to show all operations in that group, along with parameter details and a link to the documentation:

GitHub’s API generally provides responses as JSON, which isn’t that easy to read at a glance. ghapi formats all JSON responses automatically in a human readable format using bulleted lists. For instance, displaying an issue:

ghapi reduces boilerplate for developers. For instance, arguments are inserted dynamically as needed, if you provide them at instantiation (such as the repository name and owner name), and custom headers (such as for preview endpoints) are handled automatically.

Integrated methods

ghapi doesn’t just provide direct access to the GitHub API, it also integrates a lot of functionality up into Pythonic wrappers.

For instance, many GitHub API calls are paginated, which normally requires that developers manually figure out how many pages to loop through and call a different REST endpoint for each page. In ghapi, pagination is always available as iterators, so you can just use a for loop, list comprehension or whatever approach you like for retrieving each page. ghapi allows you to easily turn any paged result into an iterator with the paged function. For instance, to get a list of all the repositories in the GitHub organization:

Better still, ghapi even lets you grab all available pages at once in parallel and concatenates the results for you. You can read more about this functionality here.

This parallel pages functionality is then leveraged in ghapi.event, a module that provides convenient access to the Events API. Normally, using this API requires understanding the special headers that track quota use, using multiple different REST endpoints depending on what events you need, handling multi-page requests, avoiding repeated events, and so forth. ghapi.event handles all that for you, with the fetch_events generator, which provides a continuous stream of deduplicated events, filtered (optionally) to remove bot activity, along with additional filters chosen by the developer.

Many features of the GitHub API are wrapped into convenient integrated methods for you. For instance, editing a file directly in GitHub can be done with a single call:

The CLI

ghapi also offers a CLI, which offers the same features as the python interface demonstrated above. Exploration of endpoints through tab completion is available in the CLI as well. Here’s a demonstration of this at work:

 

The CLI uses exactly the same operation and parameter names as the Python API, so you can move between the two easily. The full API documentation is available using the –help flag, for instance:

For more information on how to use and configure the ghapi CLI, see these docs.

GitHub Actions

GitHub Actions help you automate tasks in your software development workflow. As the docs explain:

GitHub Actions are event driven, meaning that you can run a series of commands after a specified event has occurred. For example, every time someone creates a pull request for a repository, you can automatically run a command that executes a software testing script.

They are a powerful system for automating many software development processes, and open source projects get unlimited compute hours for free! However, workflows can be difficult to create and debug, because:

  • They run in a custom environment that can be difficult to replicate locally.
  • Testing workflows often involve creating real events, which can take minutes before your updated workflow is run and you can see the results. This often leads to slow debug and development cycles.
  • It can be  difficult to develop interactively and iteratively.
  • The canonical way to define a workflow is to write YAML, complimented by special syntax specific to the Actions runtime. However, developers may want to use the full expressiveness of programming languages they are comfortable with instead.
  • Building full-featured extensions requires using TypeScript, which is not a language that many developers are accustomed to using for scripting, sysadmin, and continuous integration tasks.

However, with ghapi you can write your workflows entirely in Python, doing nearly all your development on your local machine. For example, if you want to create a workflow that automatically replies to all pull requests with a comment that says “thank you,”,you would first use the following command to create the new workflow:

That will create a Python file for you, which you can then edit to add your script, for instance:

That’s it! You can use the full power of the GitHub API and Python to do whatever you want, in a way that minimizes the use of syntax specific to GitHub Actions. With example payloads available during development for all event types, you can test your scripts on your own machine. You can read more about this functionality here.

OpenAPI

The key to ghapi’s power and scope is the OpenAPI specification of the GitHub API. OpenAPI provides a way to provide a complete machine-readable specification of their APIs, which can then be used to automate the creation of libraries that access those APIs. Creating a GitHub OpenAPI spec, fully and precisely documenting a 12-year-old API covering thousands of features, is a huge task. Having spent a lot of time studying this spec, I can see that a great deal of care and thought has been put into its design. I have not come across a single missing parameter or a single error. The operations are sensibly and consistently named, and the whole API is carefully categorized.

ghapi is one of the first GitHub OpenAPI libraries for any language, and it fully utilizes Python’s dynamic features to do things that no other REST API library we’ve seen can do. For instance, most OpenAPI libraries use code generation to create separate methods and data types for each part of a spec. This is the approach used, for instance, by the experimental octo-go client. Whilst it gets the job done, code generation like this tends to be very verbose. For instance, the octo-go wrapper for the methods from just one group of operations (the repo group), is nearly 20,000 lines of code! In a static language like Go, there isn’t much choice, since the compiler has to be told about every data type and every function.

The entire ghapi package on the other hand, weighs in at around 40kB! The trick is that we created a Python module with the key information from the spec and then wrote classes that dynamically create methods by referencing that module. There’s a lot of neat tricks involved in making this work, which we’ll explain in detail in a future post.

Credits

Thanks to Rachel Thomas for help in writing this post.