Highlights from Git 2.19

Image of Taylor Blau

The open source Git project just released Git 2.19, with features and bug-fixes from over¬†60 contributors. Here’s a look at some of the most interesting features introduced in the latest versions of Git.

Compare histories with git range-diff

You might have used¬†git rebase, which is a powerful tool for rewriting history by altering commits, commit order, or branch bases to name a few. Many people do this to “polish” a series of commits before proposing to merge them into a project. But how can we visualize the differences between two sets of commits, before and after a rebase?

We can use git diff to show the difference between the two end states, but that doesn’t provide information about the individual commits. And if the base on which the commits were built has changed, the resulting state might be quite different, even if the changes in the commits are largely the same.

Git 2.19 introduces git range-diff, a tool for comparing two sequences of commits, including changes to their order, commit messages, and the actual content changes they introduce.

git range-diff example

In this example, we rewrote a series of three commits, and compared the tips of each version using git range-diff. git range-diff shows that we moved the commit introducing README.md to be first instead of second, amended both the commit message and body of the typo fix, and introduced a new commit to add a missing newline.

[source]

git grep‘s new tricks

When you search for a phrase using¬†git grep, it’s often helpful to have additional information pertaining to each match, such as its line number and¬†function context.

In Git 2.19 you can now locate the first matching column of your query with git grep --column.

If you’re using Vim, you can also try out¬†git-jump, a Git add-on that converts useful locations in your code to jump locations in your text editor.¬†git-jump¬†can take you to merge conflicts, diff hunks, and now, exact grep locations with¬†git grep --column.

git grep --column example

git grep also learned the new -o option (meaning --only-matching). This is useful if you have a non-trivial regular expression and want to gather only the matching parts of your search.

For example, if you want to count all of the various ways that the Git source code spells “SHA-1” (e.g., “sha1”, “SHA1”, and so on):

git grep -o example

(The other options -hiI are to omit the filename, search case-insensitively, and ignore matches in binary files, respectively.)

[source, source]

Sorting branches

The git branch command, like git tag (and their scriptable counterpart, git for-each-ref), takes a --sortoption to let you order the results by a number of properties. For example, to show branches in the order of most recent update, you could use git branch --sort=-authordate. But if you always prefer that order, typing that sort option can get tiresome.

Now, you can use the branch.sort config to set the default ordering of git branch:

git branch --sort example

Note that by default, git branch sorts by refname, hence master is first and newest is last. In the above example, we tell Git that we would instead prefer the most recently updated branch first, and the rest in descending order. Hence, newest is first and master is last.

You might also want to try these other sorting options:

  • --sort=numparent¬†shows merges by how awesome they are
  • --sort=refname¬†sorts branches alphabetically by their name (this is the default, but may be useful to override in your configuration)
  • --sort=upstream¬†sorts branches by the remote from which they originate

[source]

Directory rename detection

Git has always detected renamed files as part of merges. For example, if one branch moves a file from¬†A¬†to¬†B¬†and another modifies content in¬†A, then the resulting merge will apply that modification to the content’s new location in¬†B.

The same thing can happen with files in a directory. If one branch moves a directory from A to B but another adds a new file A/file, we can infer that the file should become B/file when the two are merged. In Git 2.18, git mergedoes this whenever rename detection is enabled (which is by default).

git merge directory rename example

[source]

Tidbits

  • In Git v2.18, a remote code execution¬†vulnerability in¬†.gitmodules¬†was fixed, where an attacker could execute scripts when the victim cloned with¬†--recurse-submodules. If you haven’t upgraded, please do! The fix was also backported to v2.17.1, v2.16.4, v2.15.2, v2.14.4, and v2.13.7, so you’re safe if you’re running one of those. [source]
  • Have you ever run into a Git command line option that¬†should¬†have tab-completed but didn’t? Keeping these up to date has long been an annoying source of manual work for the project, but now the completion of options for most commands is generated automatically (along with the list of commands itself, the names of config options, and more). [source,¬†source,¬†source,¬†source]
  • gpg¬†signing and verification of commits and tags has been extended to work with¬†gpgsm, which uses X.509 certificates instead of OpenPGP keys. These certificates may be easier to manage for centralized groups (e.g., developers working for a large enterprise). [source]
  • To fetch a configuration variable with a “fallback” value, it‚Äôs common for scripts to say¬†git config core.myFoo || echo <default>. But that doesn’t give Git the opportunity to interpret¬†<default>¬†for you. When it comes to colors, this is especially important for instances where you ultimately need the ANSI color code, for say, ‚Äúbold red‚ÄĚ, but don‚Äôt want to type¬†\\033[1;31m.

    git config has long supported this with a special --get-color option, but now there are options that can be applied uniformly to all types of config. For instance, git config --type=int --default=2M core.myInt will expand the default to 2097152, and git config --type=expiry --default=2.weeks.ago gc.pruneExpireconsistently returns a number of seconds. [source, source]

  • Quick quiz: if¬†git tag -l¬†is shorthand for¬†git tag --list, then what does¬†git branch -l¬†do? If you thought, “surely it doesn’t list all branches”, then congratulations: you’re a veteran Git user!

    In fact,¬†git branch -l¬†has been used since 2006 to establish a reflog for a newly created branch, something that you probably didn’t care about since it became the default shortly after being introduced.

    That usage has been deprecated (you will receive a warning if you use git branch -l), thus clearing the way for git branch -l to mean git branch --list. [source]

  • In our last post, we discussed the new¬†--color-moved¬†option, which (unsurprisingly) colors lines moved in a diff. The lines that were moved must be identical, meaning that the feature would miss re-indented code unless you specified a diff option such as¬†--ignore-space-change. Keep in mind that this option would affect the whole diff, potentially missing space changes that you¬†do¬†care about. In Git 2.19, the whitespace for move detection can be configured independently with the new¬†--color-moved-ws¬†option. [source]
  • Many of Git’s commands are colorized, like¬†git diff,¬†git status, and so on. Since 2.17, a few more commands improved their support for colorization, too.¬†git blame¬†learned to colorize lines based on¬†age¬†or¬†by group. Messages sent from a remote server are now colorized based on their keyword (e.g., “error”, “warning”, etc.). Finally, push errors are now painted red for increased visibility. [source,¬†source,¬†source]
  • If you’ve ever run¬†git checkout¬†with the name of a remote branch, you might know that Git will¬†automatically create¬†a local branch that tracks the remote one. However, if that branch name is found in more than one remote, Git does not know which to use, and simply gives up.

    In 2.19, Git learned the checkout.defaultRemote configuration, which specifies a remote to default to when resolving such an ambiguity. [source]

  • Git interprets certain text encodings (e.g.¬†UTF-16) as binary, meaning that tools like¬†git diff¬†will not show a textual diff. Normally it’s recommended to store your text files as¬†UTF-8, but this isn’t always possible if other tools generate or expect another encoding.

    You can now tell Git which encoding you prefer in your working tree on a per-file basis by setting the working-tree-encoding attribute. This will cause Git to store the files as UTF-8 internally, and convert them back to your preferred encoding on checkout. The result looks good in git diff, as well as on hosting sites. [source]

Cooking

Some features are so big that they’re developed over the course of several releases. We have historically avoided reporting on works in progress in these posts, since the features are often still experimental, or there’s nothing you can directly start using.

That said, some of the topics upstream around this release are too exciting to ignore! So, here’s an incomplete summary of what’s happening upstream:

Partial clones

An important part of Git’s decentralized design is that all clones receive the full history of the project, making all clones true peers of one another. When there aren’t a large number of objects in your repository, things go quickly, but at a certain size clones can become frustratingly slow.

There’s ongoing work to allow “partial” clones which omit some blob and tree objects, in favor of requesting objects from the server as-needed. You can see a¬†design overview¬†of the feature, or even start experimenting yourself. Note that most public servers do not yet support the feature, but you can play with¬†git clone --filter=blob:none¬†against your local Git 2.19 install.

[source, source, source, source, source, source]

Commit graphs

Git has a very simple data model: everything is an object named after the hash of its contents, and objects point to each other by those names. Many operations walk the graph formed by those pointers. For example, asking “which releases contain this bug-fix” is really “which tag objects have a path to walk back to commit¬†X” (where¬†X¬†is the commit fixing the aforementioned bug).

Those walks have traditionally required loading each object from disk to find its pointers. But now Git can compute and store properties of each commit in a more efficient format, leading to significantly faster traversals. You can read more about it in a¬†series of blog posts¬†from the feature’s author.

[source, source, source]

Protocol v2

Git still uses roughly the same protocol for fetching that was developed in 2005: after a client connects, the server dumps the current state of all branches and tags (called the “ref advertisement”), and then the client asks for the parts it needs to update. As repositories have grown, the cost of this advertisement has become a source of inefficiency.

The protocol has added new features over the years in a backwards-compatible way by negotiating capabilities between the server and client. But one thing that¬†couldn’t¬†be changed is the ref advertisement itself, because it happens before there’s a chance to negotiate.

Now there’s a new protocol which addresses this (and more), providing a way to transfer the advertisement more efficiently. Only a few servers support the new protocol so far, but you can read more about it in¬†this blog post¬†from its designer.

[source, source, source, source]

Transitioning away from SHA-1

We mentioned earlier that all Git objects are named according to a hash of their contents. You might know that the algorithm that determines the value of that hash is SHA-1, which has not been considered safe for some time. In fact, a collision attack was discovered and published last year, which we wrote about in our post on its remediation.

Though SHA-1 collisions in Git are unlikely in practice, the Git project has decided to pick a new hashing algorithm and has made significant progress towards implementing it. Git has chosen SHA-256 as the successor to SHA-1, and is working through the transition plan to convert to it.

[source]

And everything else

That’s just a sampling of changes from the last few versions. Read the full release notes for 2.19, or find the release notes for previous versions in the Git repository.