Improved Commit Diffs

We recently rolled out a bunch of improvements to commit pages to make reviewing diffs a bit more pleasant. Diffstats Diffstat style histograms of insertions and deletions for each file…

|
| 3 minutes

We recently rolled out a bunch of improvements to commit pages to make reviewing diffs a bit more pleasant.

Diffstats

Diffstat style histograms of insertions and deletions for each file are now displayed on commit pages. This is useful for getting a high level feel for the impact of a commit:

sexy diffstat

The diffstat display is similar in spirit to the output generated by git diff --stat: a numeric representing the total number of changed lines (insertions + deletions) followed by a simple visualization of the insertion to deletion ratio.

Rename Detection

Git doesn’t track file renames, but it does support heuristic detection of renamed files when performing diff and log operations. We’ve enabled it. The file list now displays a single line for renames instead of separate file add/remove lines:

diffstat + rename detection

While it’s nice to see renames reported as such in the file list, the larger benefit comes with the actual diff. Without rename detection, commits with even a small number of renamed files can generate large and noisy diffs. The entire file contents is displayed twice: first with all deleted lines and then again with all added lines. These same diffs are reduced down to pure signal with rename detection enabled because only the lines modified between the two files are shown:

diffs + rename detection

See the -M option to git-diff(1) for information on using rename detection from the command line.

Added / Removed Files

Previously, files added or removed in a commit were shown in the file list at the top of commit pages but the actual diffs were omitted. This was a simple guard against Insanely Large Diffs That Crashed Browsers but had a few notable drawbacks:

  • It was easy to miss important changes introduced by added or removed files when reviewing commits.
  • It wasn’t possible to comment on specific lines in added or removed files.
  • It didn’t always avoid large diffs. Consider cases like SQL database dumps where each line of a large generated file is modified as part of an otherwise tiny commit. Omitting added/removed files gave no guarantee that diffs would not exceed a reasonable size.

According to Aldo Cortesi’s GitHub project analysis, the average commit touches about 4 files and 19 lines of code. We felt that commit pages needed to do a better job showing all pertinent information on these common case commits, so from now on you’ll see diffs for added and removed files:

comment on added files

Large Diffs

Displaying added/removed files left the problem of how to deal with very large diffs. What we came up with is a set of rules for omitting portions of large diffs that ensures a sane upper bound on overall diff size. It works something like this:

  • Diffs are not shown for any individual file with more than 300 changed lines (this includes modified files as well as added/removed files).
  • No more than 150 total file diffs are displayed.
  • No more than 3,000 total changed lines are shown across all diffs.

While we expect to tune these numbers over the coming weeks, the result so far has been diffs that show more of what you typically want to see and less of what you don’t.

Written by

Related posts