Improved Commit Diffs
We recently rolled out a bunch of improvements to commit pages to make reviewing diffs a bit more pleasant. Diffstats Diffstat style histograms of insertions and deletions for each file…
We recently rolled out a bunch of improvements to commit pages to make reviewing diffs a bit more pleasant.
Diffstats
Diffstat style histograms of insertions and deletions for each file are now displayed on commit pages. This is useful for getting a high level feel for the impact of a commit:
The diffstat display is similar in spirit to the output generated by git diff --stat
: a numeric representing the total number of changed lines (insertions + deletions) followed by a simple visualization of the insertion to deletion ratio.
Rename Detection
Git doesn’t track file renames, but it does support heuristic detection of renamed files when performing diff
and log
operations. We’ve enabled it. The file list now displays a single line for renames instead of separate file add/remove lines:
While it’s nice to see renames reported as such in the file list, the larger benefit comes with the actual diff. Without rename detection, commits with even a small number of renamed files can generate large and noisy diffs. The entire file contents is displayed twice: first with all deleted lines and then again with all added lines. These same diffs are reduced down to pure signal with rename detection enabled because only the lines modified between the two files are shown:
See the -M
option to git-diff(1)
for information on using rename detection from the command line.
Added / Removed Files
Previously, files added or removed in a commit were shown in the file list at the top of commit pages but the actual diffs were omitted. This was a simple guard against Insanely Large Diffs That Crashed Browsers but had a few notable drawbacks:
- It was easy to miss important changes introduced by added or removed files when reviewing commits.
- It wasn’t possible to comment on specific lines in added or removed files.
- It didn’t always avoid large diffs. Consider cases like SQL database dumps where each line of a large generated file is modified as part of an otherwise tiny commit. Omitting added/removed files gave no guarantee that diffs would not exceed a reasonable size.
According to Aldo Cortesi’s GitHub project analysis, the average commit touches about 4 files and 19 lines of code. We felt that commit pages needed to do a better job showing all pertinent information on these common case commits, so from now on you’ll see diffs for added and removed files:
Large Diffs
Displaying added/removed files left the problem of how to deal with very large diffs. What we came up with is a set of rules for omitting portions of large diffs that ensures a sane upper bound on overall diff size. It works something like this:
- Diffs are not shown for any individual file with more than 300 changed lines (this includes modified files as well as added/removed files).
- No more than 150 total file diffs are displayed.
- No more than 3,000 total changed lines are shown across all diffs.
While we expect to tune these numbers over the coming weeks, the result so far has been diffs that show more of what you typically want to see and less of what you don’t.
Written by
Related posts
Students: Start building your skills with the GitHub Foundations certification
The GitHub Foundations Certification exam fee is now waived for all students verified through GitHub Education.
Announcing GitHub Secure Open Source Fund: Help secure the open source ecosystem for everyone
Applications for the new GitHub Secure Open Source Fund are now open! Applications will be reviewed on a rolling basis until they close on January 7 at 11:59 pm PT. Programming and funding will begin in early 2025.
Software is a team sport: Building the future of software development together
Microsoft and GitHub are committed to empowering developers around the world to innovate, collaborate, and create solutions that’ll shape the next generation of technology.