Highlights from Git 2.44
The first Git release of 2024 is here! Take a look at some of our highlights on what’s new in Git 2.44.
The open source Git project just released Git 2.44 with features and bug fixes from over 85 contributors, 34 of them new. We last caught up with you on the latest in Git back when 2.43 was released.
To celebrate this most recent release, here is GitHub’s look at some of the most interesting features and changes introduced since last time.
Faster pack generation with multi-pack reuse
If you’ve ever looked closely at Git’s output when pushing or pulling a repository to/from GitHub1, you might have noticed the pack-reused
number that appears at the end of your output, like so:
$ git clone git@github.com:git/git.git
Cloning into 'git'...
remote: Enumerating objects: 361232, done.
remote: Counting objects: 100% (942/942), done.
remote: Compressing objects: 100% (453/453), done.
remote: Total 361232 (delta 598), reused 773 (delta 487), pack-reused 360290
[...]
If you’ve ever looked at that number (above, this is pack-reused 360290
), and wondered what it meant, then look no further!
In general terms, that number refers to how much of the pack GitHub was able to send by (more-or-less) streaming verbatim sections of a pack that already exists down to the cloner, instead of generating a new pack on the fly. When Git is sending objects to the client (when fetching/cloning), to the server (when pushing), or to itself (when repacking), Git needs to generate a packfile that contains the set of objects being transferred. For many of the objects in this pack, Git will locate those objects, open and parse them, then optionally try and pair them with some existing object to form a delta chain.
Repeating this process over all objects in the pack yields a more compact result, since Git will find and pair objects who have similar content to one another to save space. When pushing a small amount of data to GitHub, this search is usually negligible and doesn’t take a significant amount of time. But during a clone, loading and trying to re-delta-ify all of the reachable objects in a repository can become prohibitively expensive, especially when carried out over tens of thousands of clones or more.
To save time, Git takes a shortcut: because the wire format used to transfer objects uses the same representation as the .pack
files on disk (in $GIT_DIR/objects/pack
), it can reuse sections of an existing packfile byte-for-byte when generating the new pack to send down to the client.
In our above example, that’s exactly what happened: the pack-reused 360290
portion of our output indicated that GitHub was able to reuse 360,290 objects from disk without having to re-open and search for new deltas. That process was carried out only over the remaining objects (in this case, 361,232 less the reused quantity gives us just over 900 objects that took the slow path).
Verbatim pack-reuse sounds like a great deal, right? It is, but there are a couple of gotchas that impose a couple of restrictions on how often Git can make use of this optimization:
- Packfiles cannot contain the same object more than once. For single-pack reuse, this is easy enough (since the pack we’re reusing from also can’t contain duplicate copies of an object), but it makes implementing multi-pack reuse difficult.
- Certain kinds of deltas (which identify their base by the number of bytes between the delta and base) need to be “patched” if there is an omitted section between the delta and its base, changing the offset.
In order to take full advantage of verbatim pack-reuse, a repository needs to have a majority of its objects packed together in a single packfile. For many repositories, this isn’t a huge deal, but it can become prohibitively expensive for large repositories with many hundreds of millions of objects.
Git 2.44 ships with new support for reusing objects across multiple packs. When using a multi-pack index with reachability bitmaps (for more about these, check out our post, Scaling monorepo maintenance), Git can now take advantage of this optimization across multiple packs, eliminating the need to repack your repository into a single pack.
We’ll cover the precise details in a future blog post dedicated to multi-pack reuse. For now, you might notice a new line of output in your terminal the next time you push to GitHub:
$ git push
Enumerating objects: 350175, done.
Counting objects: 100% (832/832), done.
Compressing objects: 100% (132/132), done.
Total 350175 (delta 735), reused 700 (delta 700), pack-reused 349343 (from 36)
[...]
Notice instead of just pack-reused
, we get an extra piece of information next to it ((from 36)
), indicating the number of packs from which objects were reused.
To try this out yourself, upgrade your local installation of Git, and run
$ git config --global pack.allowPackReuse multi
$ git multi-pack-index write --bitmap
before the next time you push to GitHub.
[source]
Faster rebases (and much more) with git replay
If you’ve read this series before, you’re no doubt familiar with our coverage of merge-ort
, a recent development in Git that is a from-scratch rewrite of the merging backend. If you’re a newcomer to this series (first of all, welcome!), our coverage beginning in our Highlights from Git 2.33 is a great place to start.
merge-ort
was introduced almost a dozen Git versions ago and aimed to solve several long-standing issues with its predecessor, the recursive
backend. The recursive backend was notoriously difficult to modify, and had difficulty performing well when dealing with merges that involve a large number of renames.
The merge-ort
backend was introduced to address these issues, by providing a structured implementation that was correct (with respect to the existing behavior, making it a drop-in replacement for the existing backend), performant, and easy to change. In Git 2.34 (for those interested, our coverage begins here), merge-ort
became the default merging backend, meaning that if you’re running Git 2.34 or newer and don’t have any special configuration, you’re almost certainly already making use of merge-ort
. Modern versions of Git use the merge-ort
backend to resolve conflicts between files on either side of a merge or rebase. With merge-ort
in place and widely used, merges and rebases could be computed significantly faster.
But merge-ort
also makes it possible to compute merges and rebases without requiring that you have a fully populated checkout of your repository. To perform merges, the merge-tree command
command used the --write-tree
option to compute merges with merge-ort
without requiring a checked out version of your repository.
Rebases were a different story. The existing git rebase
sub-command comes with a lot of historical design decisions and assumptions that would make integrating it with merge-ort
less than straightforward, and would hinder performance without breaking backwards compatibility guarantees2.
git replay
exists to address these challenges. It offers an alternative to git rebase
that, in addition to being far more performant:
- Can operate in bare repositories.
- Can rebase branches other than the currently checked-out one (in non-bare repositories).
- Can operate over multiple branches simultaneously.
and much more. GitHub has been using merge-ort
for more than a year to power all merges (and more recently, all rebases) performed on GitHub.com, and it has brought substantial performance improvements to both operations.
You might find git replay
useful if you’re scripting around in a repository, interested in eeking out performance gains relative to git rebase
, or are just interested in playing around with the latest and greatest developments in the Git project. Regardless of which camp you’re in, you can learn more about git replay
here.
[source]
- While we’re on the topic of rebases, let’s talk about
--autosquash
. In case you’ve never used that option before, don’t worry; here’s a quick introduction. When rebasing, Git will try to combine commits whose subject line begins withfixup! [...]
,squash! [...]
, oramend! [...]
, where the[...]
is the log message of some other commit. Git will pair these up and reorganize the todo list to put thefixup! [...]
commits (etc.) next to their non-fixup!
counterparts.Depending on the verb, Git will either combine changes, alter the commit message, or merge successive commit messages together, allowing you to easily edit your work.
However, previous versions of Git only provided functionality for these options when using interactive rebases with
git rebase --interactive
(or justgit rebase -i
, for short). If you wrote afixup!
commit (or similar) and wanted to quickly apply it at the right spot in history, you’d have to either: (a) rungit rebase -i
and close your$EDITOR
, or (b) runGIT_SEQUENCE_EDITOR=true git rebase -i
.In Git 2.44, autosquash-ing now works with non-
--interactive
rebases, meaning that you can do a baregit rebase
and apply yourfixup!
‘s in their respective locations without having to inspect the todo list or munge yourGIT_SEQUENCE_EDITOR
environment variable.[source]
-
If you’ve been using Git for a long time (or are a newcomer), you’ve probably seen a message beginning with
hint:
, like so:hint: Updates were rejected because the tag already exists in the remote. hint: Disable this message with "git config advice.pushAlreadyExists false"
Like the hint suggests, you can run
git config advice.pushAlreadyExists false
to tell Git to avoid showing you the message. But what if you find the advice useful? Perhaps you want to be warned (for example) when attempting to push a tag without--force
to a remote which already has a tag by that same name. When that’s the case, you likely don’t want to also see the “Disable this message with […]” portion of the hint.In Git 2.44, you can now set
git config advice.pushAlreadyExists true
to indicate that you want to receive that hint, and Git will continue to show it to you, suppressing the “Disable this message with […]” portion of the message.[source]
-
Quick quiz: what does the
--no-sort
option do when given togit for-each-ref
? If you thought, “surely it doesn’t list all references in a non-alphabetical order,” then congratulations, you’re a veteran Git user!Despite its name
--no-sort
provided the output ofgit for-each-ref
in a sorted order, making it unable to take advantage of certain optimizations that assume an arbitrary ordering.For those interested in the technical details, you can learn more in the patches linked below. If you just want the numbers, you’re in luck: on my machine,
git for-each-ref --no-sort
outperforms a bog-standardgit for-each-ref
by more than 20% on a repository with a large number of references.[source]
-
If you’ve spent much time pursuing the Git documentation, you’ve likely encountered the term “pathspec”, and perhaps wondered what it meant. In Git parlance, “pathspec” roughly corresponds to “ways to limit filepaths” when used in conjunction with a Git command.
There are lots of examples in the documentation, but some notable ones include:
git show ':^Documentation/'
(meaning, “show me the last commit, excluding any changes in the Documentation directory”),git show ':(icase)**/*sha256*'
(meaning, “show me files with ‘sha256’ in their path, regardless of casing”), andgit show ':(attr:!binary)'
(meaning, “show me files which do not have theirbinary
attribute set via.gitattributes
“).In Git 2.44,
git add
now understands theattr
pathspec magic, meaning that you can do things likegit add ':(attr:!binary)'
to stage all text/non-binary files in the index.Git 2.44 also introduces a new pathspec attribute, called
builtin_objectmode
. This new pathspec magic allows filtering paths by their mode (for example,100644
for non-executable files,100755
for executable ones,160000
for submodules, etc.). Thebuiltin_
prefix indicates that you can use this pathspec magic without needing to set any values in your.gitattributes
file(s), meaning that you can do things likegit add ':(builtin_objectmode=100755)'
to add all executable files in your working copy.
The whole shebang
That’s just a sample of changes from the latest release. For more, check out the release notes for 2.44, or any previous version in the Git repository.
Notes
- If you’re reading this blog post (especially the footnotes!) there’s a pretty good chance that you have. ↩
-
For those curious, an extensive discussion on why
git replay
was used instead of extendinggit rebase
can be found on the mailing list here. ↩
Tags:
Written by
Related posts
Highlights from Git 2.47
Git 2.47 is here, with features like incremental multi-pack indexes and more. Check out our coverage of some of the highlights here.
Leading the way: 10 projects in the Open Source Zone at GitHub Universe 2024
Let’s take a closer look at some of the stars of the Open Source Zone at GitHub Universe 2024 🔎
The 10 best tools to green your software
Looking for ways to code in a more sustainable way? We’ve got you covered with our top list of tools to help lower your carbon footprint.