Tuesday, September 28, 2010

Stupid Git Tricks for PostgreSQL

Even before PostgreSQL switched to git, we had a git mirror of our old CVS repository.  So I suppose I could have hacked up these scripts any time.  But I didn't get around to it until we really did the switch.  Here's the first one.  It's a one-liner.  For some definition of "one line".

git log --format='%H' --shortstat `git merge-base REL9_0_STABLE master`..master | perl -ne 'chomp; if (/^[0-9a-f]/) { print $_, " "; } elsif (/files changed/) { s/^\s+//; my @a = split /\s+/; print $a[3] + $a[5], "\n" }' | sort -k2 -n -r | head | cut -d' ' -f1 | while read commit; do git log --shortstat -n 1 $commit | cat; echo ""; done

This will show you the ten "biggest" commits since the REL9_0_STABLE branch was created, according to number of lines of code touched.  Of course, this isn't a great proxy for significance, as the output shows.  Heavily abbreviated, largest first:

66424a284879b Fix indentation of verbatim block elements (Peter Eisentraut)
9f2e211386931 Remove cvs keywords from all files (Magnus Hagander)
4d355a8336e0f Add a SECURITY LABEL command (Robert Haas)
c10575ff005c3 Rewrite comment code for better modular
ity, and add necessary locking (Robert Haas)
53e757689ce94 Make NestLoop plan nodes pass outer-relation variables into their inner relation using the general PARAM_EXEC executor parameter mechanism, rather than the ad-hoc kluge of passing the outer tuple down through ExecReScan (Tom Lane)
5194b9d04988a Spell and markup checking (Peter Eisentraut)
005e427a22e3b Make an editorial pass over the 9.0 release notes. (Tom Lane)
3186560f46b50 Replace doc references to install-win32 with install-windows (Robert Haas)
debcec7dc31a9 Include the backend ID in the relpath of temporary relations (Robert Haas)
2746e5f21d4dc Introduce latches. A latch is a boolean variable, with the capability to wait until it is set (Heikki Linnakangas)

Of course, some of these are not-very-interesting commits that happen to touch a lot of lines of code, but a number of them represented significant refactoring work that can be expected to lead to good things down the line.  In particular, latches are intended to reduce replication latency and eventually facilitate synchronous replication; and Tom's PARAM_EXEC refactoring is one step towards support for the SQL construct LATERAL().

OK, one more.

#!/bin/sh

BP=`git merge-base master REL9_0_STABLE`

git log --format='format:%an' $BP..master | sort -u |
while read author; do
    echo "$author: \c"
    git log --author="$author" --numstat $BP..master |
    awk '/^[0-9]/ { P += $1; M += $2 }
         /^commit/ { C++ }
         END { print C " commits, " P " additions, " M " deletions, " (P+M) " total"}'
done

This one shows you the total number of lines of code committed to 9.1devel, summed up by committer.  It has the same problem as the previous script, which is that it sometimes you change a lot of lines of code without actually doing anything terribly important.  It has a further problem, too: it only takes into account the committer, rather other important roles, including reporter, authors, and reviewers.  Unfortunately, that information can't easily be extracted from the commit logs in a structured way.  I would like to see us address that defect in the future, but we're going to need something more clever than git's Author field.  Most non-trivial patches, in the form in which they are eventually committed, are the work of more than one person; and, at least IMO, crediting only the main author (if there even is one) would be misleading and unfair in many cases.

I think the most interesting tidbit I learned from playing around with this stuff is that git merge-base can be used to find the branch point for a release.  That's definitely handy.

No comments:

Post a Comment