Introducing the GHES repository cache
If you’re a GHES customer with heavy read traffic on your monorepo, check out the repository cache, especially if you have CI workloads distributed around the world.
Continuous integration (CI) runners often drive the lion’s share of load on a Git server. They can build everything from merges to main, pull requests, or even every single commit. Any of these mean frequent clones and pulls. A large enough farm of CI runners may cause slowdowns for users of Git. A related problem for large organizations is geo-distribution, such as a CI farm in Europe pulling from a Git server in North America. Each runner has to pull each change over the same high-latency, possibly expensive, link. It can be cheaper, faster, and even more reliable if a single machine pulls the changes from North America and then distributes them to the runners locally in Europe.
Since Git traffic isn’t suitable for classic content delivery networks, and geo-replicas can cause slowdowns for developers trying to push data, large organizations need something different. Enter the GitHub Enterprise Server repository cache, now available in public beta.
The repository cache is an eventually-consistent replica of your Git data and is updated by a background job. This offers the data locality and convenience of geo-replicas, but doesn’t participate in the initial receipt of Git data, meaning that developer push workflows aren’t affected.
The other advantage repository caches offer is selective replication: you choose exactly which repositories are replicated to specific caches. Several enterprise customers expressed a need for strict data residency. There are repositories that should be replicated elsewhere, and repositories that should remain solely in their home location. Replication policies allow you to identify these repositories and honor these restrictions.
There are a few tradeoffs when using a repository cache. Repository caches are purely read-only; all writes have to go to the primary. This is why we recommend that only CI farms use repository caches, and that developers continue to work with the primary. We’ve detailed other strategies for speeding up developer workflows with large monorepos, and those strategies are compatible with offloading CI traffic to repository caches. Also, it can take up to several minutes for new Git data to arrive at the cache. Readers must implement a backoff-and-retry strategy for commits which they expect but don’t find immediately.
This feature shipped in beta with GHES 3.3 and has been improved in GHES 3.4. There are a handful of known issues, listed below, but if these aren’t showstoppers for you, we’d love to have you try it out and share feedback with us. Get started by visiting the documentation.
Known cache server issues in GHES 3.3 and 3.4:
- LFS files are not replicated. Instead, LFS acts as a proxy and will stream the content from the primary instance. If you make heavy use of LFS, then you may not reap much benefit from the cache server in its current state. LFS caching will be implemented in a future release of the cache server.
- Repositories which aren’t allowed by replication policy may still be streamed through the cache server. It’s important to note that the cache server is not an authorization boundary. If a user with access to a repository attempts to access it through a cache server, the cache server will proxy the data, but will not store the data. This behavior will be changed in a future release of the cache server.
Tags:
Written by
Related posts
Announcing GitHub Secure Open Source Fund: Help secure the open source ecosystem for everyone
Applications for the new GitHub Secure Open Source Fund are now open! Applications will be reviewed on a rolling basis until they close on January 7 at 11:59 pm PT. Programming and funding will begin in early 2025.
Software is a team sport: Building the future of software development together
Microsoft and GitHub are committed to empowering developers around the world to innovate, collaborate, and create solutions that’ll shape the next generation of technology.
Does GitHub Copilot improve code quality? Here’s what the data says
Findings in our latest study show that the quality of code written with GitHub Copilot is significantly more functional, readable, reliable, maintainable, and concise.