For Good First Issue is a curated list of open source projects that are also digital public goods and need the help of developers.
The open source Git project just released Git 2.19, with features and bug-fixes from over 60 contributors. Here’s a look at some of the most interesting features introduced in the latest versions of Git.
You might have used
git rebase, which is a powerful tool for rewriting history by altering commits, commit order, or branch bases to name a few. Many people do this to “polish” a series of commits before proposing to merge them into a project. But how can we visualize the differences between two sets of commits, before and after a rebase?
We can use
git diff to show the difference between the two end states, but that doesn’t provide information about the individual commits. And if the base on which the commits were built has changed, the resulting state might be quite different, even if the changes in the commits are largely the same.
Git 2.19 introduces
git range-diff, a tool for comparing two sequences of commits, including changes to their order, commit messages, and the actual content changes they introduce.
In this example, we rewrote a series of three commits, and compared the tips of each version using
git range-diff shows that we moved the commit introducing
README.md to be first instead of second, amended both the commit message and body of the typo fix, and introduced a new commit to add a missing newline.
When you search for a phrase using
git grep, it’s often helpful to have additional information pertaining to each match, such as its line number and function context.
In Git 2.19 you can now locate the first matching column of your query with
git grep --column.
If you’re using Vim, you can also try out
git-jump, a Git add-on that converts useful locations in your code to jump locations in your text editor.
git-jump can take you to merge conflicts, diff hunks, and now, exact grep locations with
git grep --column.
git grep also learned the new
-o option (meaning
--only-matching). This is useful if you have a non-trivial regular expression and want to gather only the matching parts of your search.
For example, if you want to count all of the various ways that the Git source code spells “SHA-1” (e.g., “sha1”, “SHA1”, and so on):
(The other options
-hiI are to omit the filename, search case-insensitively, and ignore matches in binary files, respectively.)
git branch command, like
git tag (and their scriptable counterpart,
git for-each-ref), takes a
--sortoption to let you order the results by a number of properties. For example, to show branches in the order of most recent update, you could use
git branch --sort=-authordate. But if you always prefer that order, typing that sort option can get tiresome.
Now, you can use the
branch.sort config to set the default ordering of
Note that by default,
git branch sorts by refname, hence
master is first and
newest is last. In the above example, we tell Git that we would instead prefer the most recently updated branch first, and the rest in descending order. Hence,
newest is first and
master is last.
You might also want to try these other sorting options:
--sort=numparentshows merges by how awesome they are
--sort=refnamesorts branches alphabetically by their name (this is the default, but may be useful to override in your configuration)
--sort=upstreamsorts branches by the remote from which they originate
Git has always detected renamed files as part of merges. For example, if one branch moves a file from
B and another modifies content in
A, then the resulting merge will apply that modification to the content’s new location in
The same thing can happen with files in a directory. If one branch moves a directory from
B but another adds a new file
A/file, we can infer that the file should become
B/file when the two are merged. In Git 2.18,
git mergedoes this whenever rename detection is enabled (which is by default).
- In Git v2.18, a remote code execution vulnerability in
.gitmoduleswas fixed, where an attacker could execute scripts when the victim cloned with
--recurse-submodules. If you haven’t upgraded, please do! The fix was also backported to v2.17.1, v2.16.4, v2.15.2, v2.14.4, and v2.13.7, so you’re safe if you’re running one of those. [source]
- Have you ever run into a Git command line option that should have tab-completed but didn’t? Keeping these up to date has long been an annoying source of manual work for the project, but now the completion of options for most commands is generated automatically (along with the list of commands itself, the names of config options, and more). [source, source, source, source]
gpgsigning and verification of commits and tags has been extended to work with
gpgsm, which uses X.509 certificates instead of OpenPGP keys. These certificates may be easier to manage for centralized groups (e.g., developers working for a large enterprise). [source]
- To fetch a configuration variable with a “fallback” value, it’s common for scripts to say
git config core.myFoo || echo <default>. But that doesn’t give Git the opportunity to interpret
<default>for you. When it comes to colors, this is especially important for instances where you ultimately need the ANSI color code, for say, “bold red”, but don’t want to type
git confighas long supported this with a special
--get-coloroption, but now there are options that can be applied uniformly to all types of config. For instance,
git config --type=int --default=2M core.myIntwill expand the default to 2097152, and
git config --type=expiry --default=2.weeks.ago gc.pruneExpireconsistently returns a number of seconds. [source, source]
- Quick quiz: if
git tag -lis shorthand for
git tag --list, then what does
git branch -ldo? If you thought, “surely it doesn’t list all branches”, then congratulations: you’re a veteran Git user!
git branch -lhas been used since 2006 to establish a reflog for a newly created branch, something that you probably didn’t care about since it became the default shortly after being introduced.
That usage has been deprecated (you will receive a warning if you use
git branch -l), thus clearing the way for
git branch -lto mean
git branch --list. [source]
- In our last post, we discussed the new
--color-movedoption, which (unsurprisingly) colors lines moved in a diff. The lines that were moved must be identical, meaning that the feature would miss re-indented code unless you specified a diff option such as
--ignore-space-change. Keep in mind that this option would affect the whole diff, potentially missing space changes that you do care about. In Git 2.19, the whitespace for move detection can be configured independently with the new
- Many of Git’s commands are colorized, like
git status, and so on. Since 2.17, a few more commands improved their support for colorization, too.
git blamelearned to colorize lines based on age or by group. Messages sent from a remote server are now colorized based on their keyword (e.g., “error”, “warning”, etc.). Finally, push errors are now painted red for increased visibility. [source, source, source]
- If you’ve ever run
git checkoutwith the name of a remote branch, you might know that Git will automatically create a local branch that tracks the remote one. However, if that branch name is found in more than one remote, Git does not know which to use, and simply gives up.
In 2.19, Git learned the
checkout.defaultRemoteconfiguration, which specifies a remote to default to when resolving such an ambiguity. [source]
- Git interprets certain text encodings (e.g.
UTF-16) as binary, meaning that tools like
git diffwill not show a textual diff. Normally it’s recommended to store your text files as
UTF-8, but this isn’t always possible if other tools generate or expect another encoding.
You can now tell Git which encoding you prefer in your working tree on a per-file basis by setting the
working-tree-encodingattribute. This will cause Git to store the files as
UTF-8internally, and convert them back to your preferred encoding on checkout. The result looks good in
git diff, as well as on hosting sites. [source]
Some features are so big that they’re developed over the course of several releases. We have historically avoided reporting on works in progress in these posts, since the features are often still experimental, or there’s nothing you can directly start using.
That said, some of the topics upstream around this release are too exciting to ignore! So, here’s an incomplete summary of what’s happening upstream:
An important part of Git’s decentralized design is that all clones receive the full history of the project, making all clones true peers of one another. When there aren’t a large number of objects in your repository, things go quickly, but at a certain size clones can become frustratingly slow.
There’s ongoing work to allow “partial” clones which omit some blob and tree objects, in favor of requesting objects from the server as-needed. You can see a design overview of the feature, or even start experimenting yourself. Note that most public servers do not yet support the feature, but you can play with
git clone --filter=blob:none against your local Git 2.19 install.
Git has a very simple data model: everything is an object named after the hash of its contents, and objects point to each other by those names. Many operations walk the graph formed by those pointers. For example, asking “which releases contain this bug-fix” is really “which tag objects have a path to walk back to commit
X is the commit fixing the aforementioned bug).
Those walks have traditionally required loading each object from disk to find its pointers. But now Git can compute and store properties of each commit in a more efficient format, leading to significantly faster traversals. You can read more about it in a series of blog posts from the feature’s author.
Git still uses roughly the same protocol for fetching that was developed in 2005: after a client connects, the server dumps the current state of all branches and tags (called the “ref advertisement”), and then the client asks for the parts it needs to update. As repositories have grown, the cost of this advertisement has become a source of inefficiency.
The protocol has added new features over the years in a backwards-compatible way by negotiating capabilities between the server and client. But one thing that couldn’t be changed is the ref advertisement itself, because it happens before there’s a chance to negotiate.
Now there’s a new protocol which addresses this (and more), providing a way to transfer the advertisement more efficiently. Only a few servers support the new protocol so far, but you can read more about it in this blog post from its designer.
We mentioned earlier that all Git objects are named according to a hash of their contents. You might know that the algorithm that determines the value of that hash is SHA-1, which has not been considered safe for some time. In fact, a collision attack was discovered and published last year, which we wrote about in our post on its remediation.
Though SHA-1 collisions in Git are unlikely in practice, the Git project has decided to pick a new hashing algorithm and has made significant progress towards implementing it. Git has chosen SHA-256 as the successor to SHA-1, and is working through the transition plan to convert to it.