email as a bona fide git transport

From: Vegard Nossum
Date: Wed Oct 16 2019 - 06:23:11 EST


This is a multi-part message in MIME format. (cross-posted to git, LKML, and the kernel workflows mailing lists.)

Hi all,

I've been following Konstantin Ryabitsev's quest for better development
and communication tools for the kernel [1][2][3], and I would like to
propose a relatively straightforward idea which I think could bring a
lot to the table.

Step 1:

* git send-email needs to include parent SHA1s and generally all the
information needed to perfectly recreate the commit when applied so
that all the SHA1s remain the same

* git am (or an alternative command) needs to recreate the commit
perfectly when applied, including applying it to the correct parent

Having these two will allow a perfect mapping between email and git;
essentially email just becomes a transport for git. There are a lot of
advantages to this, particularly that you have a stable way to refer to
a patch or commit (despite it appearing on a mailing list), and there
is no need for "changeset IDs" or whatever, since you can just use the
git SHA1 which is unique, unambiguous, and stable.

As a rough proof of concept I've attached 3 git patches which implement
this. There are issues to work out like exact format, encodings, mail
mangling, error handling, etc., but hopefully the git community can
help out here. (Improvement suggestions are welcome!)

Step 2:

* A bot that follows LKML (and other lists) and imports patchsets into
a git repository hosted on git.kernel.org

* The bot can add git notes with URLs to lore (and/or other mailing
list archives) and store them in e.g. refs/notes/lore,
refs/notes/lkml, etc.

(For those who don't use git notes yet: they are essentially small
bits of information you can add to a commit without changing its SHA1,
and you can configure tools like 'git log' to show these at the bottom
of a commit. Notes can also exist in a repo completely separate from
the commits they attach data to, so there is _zero_ overhead for those
who don't want to use this.)

* Maintainers can either pull patchsets directly from this bot-
maintained repo OR they can continue to apply patches from their inbox
(the result should be the same either way) OR they can continue in the
old-style process (at least for a while) and just not have the
benefits of the new process.

Step 3:

* Instead of describing a patchset in a separate introduction email, we
can create a merge commit between the parent of the first commit in
the series and the last and put the patchset description in the merge
commit [5]. This means the patchset description also gets to be part
of git history.

(This would require support for git send-email/am to be able to send
and apply merge commits -- at least those which have the same tree as
one of the parents. This is _not_ yet supported in my proposed git
patches.)

* stable SHA1s means we can refer to previous versions of a patchset by
SHA1 rather than archive links. I propose a new changelog tag for
this, maybe "Previous:" or maybe even a full list of "v1:", "v2:",
etc. with a SHA1 or ref. Note that these SHA1s do *not* need to exist
in Linus's repo, but those who want can pull those branches from the
bot-maintained repo on git.kernel.org.

Advantages:

- we can keep using email to post patches/patchsets

- the process is opt-in (but should be encouraged) for both authors and
maintainers, and the transition can happen over time

- there is a central repo for convenience, but it is not necessary for
development to happen and is not a single point of failure -- it's
more like Linus's repo and can be moved or even replicated from
scratch by somebody else simply by having mailing list archives

- allows quick lookup of patch/patchset <-> email discussion within git

- allows diffing between versions of a single logical patchset

- patchset descriptions naturally become part of the changelog that ends
up in Linus's tree

Disadvantages:

- requires patching git

- requires a bot to continuously create branches for patchsets sent to
mailing lists

- increased storage/bandwidth for git.kernel.org (?)

- may need a couple of new wrapper scripts to automate patchset
construction/versioning

Thoughts?


Vegard

PS: Eric Wong described something that comes quite close to this idea, but AFAICT without actually recreating commits exactly. I've included the link for completeness. [4]


[1]: https://lwn.net/Articles/793037/ "Ryabitsev: Patches carved into
developer sigchains"

[2]: https://lwn.net/Articles/799134/ "Defragmenting the kernel
development process"

[3]: https://lore.kernel.org/workflows/20190924182536.GC6041@xxxxxxxxxxxxxxxxxxxxxxxxxxxx/

[4]: https://lore.kernel.org/workflows/20191008003931.y4rc2dp64gbhv5ju@dcvr/

[5]: To create this merge commit one could use something like this (bash):

# usage: patchset BASE [PREVIOUS_VERSION]
patchset () {
start=$1
prev=$2

# construct tentative commit message
commit_editmsg="$(git rev-parse --git-dir)/COMMIT_EDITMSG"
(
if [ -z "$prev" ]
then
echo 'Patchset title'
echo
echo Commits:
echo
git log --oneline $start..HEAD
else
git show --format=format:%B --no-patch $prev
echo Previous-version: $(git rev-parse $prev)
fi
) > "${commit_editmsg}"

${EDITOR} "${commit_editmsg}"

merge=$(git commit-tree -p $start -p HEAD -F "${commit_editmsg}" $(git rev-parse HEAD^{tree}))
echo $merge
}

This will open the editor to edit the patchset description and create a
merge commit that encompasses the patches in the patchset (use sha1^- to
view the patches in it).