Git Concurrency in GitHub Desktop

Careful use of concurrency is particularly important when writing responsive desktop applications. Typically, complex operations are executed on background threads. This results in an app that remains responsive to user…

| 7 minutes

Careful use of concurrency is particularly important when writing responsive
desktop applications. Typically, complex operations are executed on background
threads. This results in an app that remains responsive to user input, while
still performing complex tasks.

In GitHub Desktop, many background threads will
read or write to the same Git repository, at the same time.

However, git is typically not used in a concurrent fashion. When using git via
the command line, operations are executed in a sequential manner. Read or write
operations are performed against git, independently of each other.
Commands are executed in serial, on the command line
During the build of GitHub Desktop, we discovered executing git commands serially
was a one-way ticket to an unresponsive app. For example, waiting
to load diffs until after we’ve counted the number of commits in the history
would result in a slow and unresponsive application.

To maintain correctness and a responsive user interface, we needed a
solution to concurrency control.

Git, libgit2 and concurrency

GitHub Desktop has two methods of interacting with a git repository.

  • Calling into C implementations of the Git core methods via libgit2
  • Shelling out to the git command line interface

We would like to use libgit2 for all of our git operations because it is faster and easier to program with. Unfortunately it is not yet a complete implementation, so we use the CLI to fill in the missing functionality.
This poses an interesting problem, in that both git and libgit2 have different
approaches to concurrency control.

Git implements a pessimistic approach to concurrency control. Lock files are used to prevent concurrent access to the underlying git objects on disk. When performing an operation against a git object, git will create a *.lock
file inside the .git directory. This signals that the * object is locked for
use. Further operations are prevented until the lock is released and the *.lock file is deleted.

By contrast libgit2 cannot guarantee objects can safely be shared between threads. Mutable data structures in libgit2 are not thread safe, and operations must be performed carefully. The libgit2 API allows you to compose granular operations together, and granular locking would come at a performance cost. Libgit2 data structures are rarely used in isolation, and concurrency control should be implemented at the level over a collection of fine grained operations or a single unit of work.

A new concurrency model

GitHub Desktop ships as a native application on both Mac and Windows. The Mac app is implemented in Objective-C, while the Windows app is implemented in C#. Both platforms are implemented in a reactive style, using
Microsoft’s Reactive Extensions (Rx)
and our own ReactiveCocoa (RAC).
This allows the composition of background tasks, such as executing git operations.
All git operations are executed asynchronously and across thread boundaries.

To ensure GitHub Desktop executed git operations in a safe, and yet performant manner, we needed a new concurrency model that enabled us to:

  • Organize work at the level of asynchronous Observables (Rx) and Signals (RAC)
    instead of synchronous blocks of code.
  • Perform most operations concurrently.
  • Retain the ability to perform destructive operations serially and exclusively, as required by Git or libgit2

Concurrent and exclusive locks

Each high level operation GitHub Desktop performs can be thought of as a unit of work.
A single unit of work can be made up of many fine-grained operations. Our units
of work can be categorized as either:

  • Concurrent operations
  • Exclusive operations

Concurrent and exclusive operations don’t always have
a 1:1 relationship with reading and writing to the underlying repository.
For example, it is safe to write Git refs concurrently with other work,
because a ref update is atomic. On the other hand, some read operations may
update caches in an unsafe way, and so those need to be performed exclusively.

GitHub Desktop uses an AsyncReaderWriterLock as a queue, upon
which concurrent operations can either be run exclusively or in parallel. Exclusive operations behave
like a barrier, waiting for previously-enqueued work to complete before beginning,
and themselves finishing before any further work starts.

To execute Git operations, the appropriate lock must first be acquired.

public class RepositoryConnection
{
  readonly string localDotGitPath;
  readonly AsyncReaderWriterLock readerWriterLock;

  public RepositoryConnection(string dotGitPath, AsyncReaderWriterLock rwLock)
  {
    localDotGitPath = dotGitPath;
    readerWriterLock = rwLock;
  }

  public IObservable<T> RepositoryConnection<T>(Func<IConcurrentRepositoryConnection, IObservable<T>> operation)
  {
     var connection = Observable.Defer(() =>
     {
         // create a new libgit2 repository object for a given path on disk
         var repo = new Repository(localDotGitPath, new RepositoryOptions();
         return Observable.Return(new ConcurrentRepositoryConnection(repo);
     });

     // defer the given operation, and close the connection on error and complete
     var executeAndClose = connection.SelectMany(conn => Observable.Defer(() => operation(conn))
                                     .Do(x => {}, ex => conn.CloseConnection(), conn.CloseConnection));

     // Add it to the concurrent lock queue.
     return readerWriterLock.AddConcurrentOperation(executeAndClose);
  }

  public IObservable<T> OpenExclusiveConnection<T>(Func<IConcurrentRepositoryConnection, IObservable<T>> operation)
  {
     var connection = Observable.Defer(() =>
     {
         // create a new libgit2 repository object for a given path on disk
         var repo = new Repository(localDotGitPath, new RepositoryOptions();
         return Observable.Return(new ExclusiveRepositoryConnection(repo);
     });

     // defer the given operation, and close the connection on error and complete
     var executeAndClose = connection.SelectMany(conn => Observable.Defer(() => operation(conn))
                                     .Do(x => {}, ex => conn.CloseConnection(), conn.CloseConnection));

     // Add it to the exclusive lock queue.
     return readerWriterLock.AddExclusiveOperation(executeAndClose);
  }
}

Inside GitHub Desktop we define two interfaces. In C# these are IExclusiveRepositoryConnection.cs and IConcurrentRepositoryConnection.cs. While in Objective-C they are defined by GHExclusiveGitConnection.h and GHGitConnection.h. Each of these implementations only allow for git operations which make sense for that lock type.

The ExclusiveRepositoryConnection will only define operations which must be performed with exclusive access to the underlying repository object. The same is true of ConcurrentRepositoryConnections. This means it is impossible to execute exclusive operations concurrently and concurrent operations exclusively. In this way we are able to prevent possible data corruption without a performance trade off.

public class ExclusiveRepositoryConnection
{
  private IRepository repository;

  public ExclusiveRepositoryConnection(IRepository repository)
  {
    this.repository = repository;
  }

  public IObservable<Unit> Commit()
  {
    // Commit to the repository
  }

  public IObservable<Unit> CloseConnection()
  {
    // Execute any required clean up
  }

  //... More exclusive operations
}

For concurrent operations, we define a similar class.

public class ConcurrentRepositoryConnection
{
  private IRepository repository;

  public ConcurrentRepositoryConnection(IRepository repository)
  {
    this.repository = repository;
  }

  public IObservable<Unit> Fetch()
  {
    // Execute a fetch against the repository
  }

  public IObservable<Sha> FindMergeBase(Sha one, Sha two)
  {
    // Calculate the merge base base between two Sha's
  }

  public IObservable<Unit> CloseConnection()
  {
    // Execute any required clean up
  }

  //... More concurrent operations
}

Below is an example of how we might execute a fetch, calculate a merge base and create a commit. Each operation is executed asynchronously using Reactive Extensions, and inside either a concurrent or an exclusive lock. In this example, each operation is queued according to the kind of lock requested. Both Fetch and FindMergeBase will execute concurrently with respect to each other. However, Commit will be queued until all currently executing operations have completed. No subsequent operation will execute until the Commit has completed.


public class LockExample
{
  private RepositoryConnection repositoryConnection;

  public LockExample(RepositoryConnection repositoryConnection)
  {
    this.repositoryConnection = repositoryConnection;
  }

  public void DoWork(Sha first, Sha second)
  {
    repository.OpenConcurrentConnection(connection => connection.Fetch())
              .Subscribe(() =>{} , () => Console.log("Fetch Completed"));

    repository.OpenConcurrentConnection(connection => connection.FindMergeBase(first, second))
              .Subscribe(() =>{} , () => Console.log("Find Merge Base Completed"));

    repository.OpenExclusiveConnection(connection => connection.Commit())
              .Subscribe(() =>{} , () => Console.log("Commit Completed"));
  }
}

Unlike a queue of synchronous work as you might find in Apple’s Grand Central
Dispatch

or Clojure’s core.async, we treat
our asynchronous and thread-hopping operations as atomic units of work.
This means that even if we relinquish all threads while waiting for some data,
our queue doesn’t actually move onto the next thing until the operation says it’s well and truly completed.

The impact

Before these changes, GitHub Desktop suffered from race conditions as units
of work would become interleaved in error.

Since implementing the concurrent/exclusive locks we have seen an improvement
in stability and performance. We now have a way to talk about concurrency control
at a higher level. At the level of a single unit of work.

By carefully managing git concurrency, GitHub Desktop protects your repositories
from possible corruption. The end result is an app that remains responsive,
while putting the integrity of your repository first.

Related posts