Re: [PATCH] Documentation: Add "how to write a good patch summary"to SubmittingPatches

From: Ingo Molnar
Date: Thu Apr 16 2009 - 18:09:05 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Thu, 16 Apr 2009, Theodore Tso wrote:
>
> > On Thu, Apr 16, 2009 at 10:12:55PM +0200, Ingo Molnar wrote:
> > > as a bug triager i can, within 1 minute, sort all the commits by
> > > risk:
> > >
> > > Low risk cleanups:
> > > ...
> > > Runtime crash fixes:
> > > ...
> > > Robustness enhancements:
> > > ...
> > > Low-risk features:
> > > ...
> > > High-risk features:
> > > ...
> >
> > Sure, but if that's the goal, maybe instead we should have some
> > keywords that we tag onto one-line summary, i.e.
> >
> > ext4 <LR,cleanup>:
>
> Hell no.

I find those artificial tags pretty ugly too.

> The fact is, those "low risk cleanups" break things.
>
> People who think that you can assess the risk of a commit
> before-hand and then rely on it are clueless morons.

That's why it's _hard_ to write good impact lines - it takes quite a
bit of effort to assess the _expected_ impact of commits reliably
and not look like a complete fool a few days, weeks or months down
the road.

Those mistakes are also _useful_ for that exact reason: they tell us
the exact pattern of mis-judged impact and act as a feedback cycle.
We learned to be a _lot_ more careful about certain areas of code by
looking at the impact lines of commits that turned out to be broken.

And if the impact cannot be assessed reliably by looking at the
patch? Then i ask contributors to split up the patch into smaller
steps.

And the thing is, commit logs themselves - as you can see it in the
18 specific examples i analyzed above - can be _far more_ ambiguous
about the true impact of a change - and you are fooling yourself if
you dont admit to this very basic, simple daily fact of Linux kernel
commit logs.

Also, natural language commit logs tend to be not too
straightforward about impact because there's a basic inner
(sub-conscious) drive in most developers to play down the impact of
some really embarrassing brown paper bag bug, or to not think too
hard about the risks of a new feature.

Impact lines _remove_ this fear and associated guilt factor. It
makes the production of commit logs _more positive_, because it's an
unavoidable hard rule to admit to crap and mistakes in a neutral,
unemotional way. And if everyone does it consistently, it looks a
lot less embarrasing.

The basic problem is that natural languages are one big babble
machine stock full of inner pardoxes and contradictions: they are
too vaguely defined, ambiguous, they are emotion laden and
over-verbose - giving fertile ground for whitewashing and obscuring
information - or just covering information in white noise.

A good commit log will tell us a nice story and gently and gradually
drives us along the pathway of the developer's thought process. But
in the overwhelming majority of cases it will not drive us along the
more embarrassing bits: how stupid a bug it fixes, how severe that
bug is, or how risky a new feature is.

_LOOK_ at the 18 commit logs i spent an hour to analyze. That is our
reality - those are the top-notch commits we have - out of the best
of the best 5%.

_ADMIT_ that this basic equation is not going to change
significantly. There's small steps of progress, but our commit logs
sucked 5 and 10 years ago too, and they sucked for very fundamental
reasons.

The ext4 logs were of exceptional quality - and we even saw one
clear 'brown paperbag' bug admitted to there, frankly and openly.
But it's the exception, and still, the impact lines i added _clearly
improved the end result_.

The impact line forces honesty without actually accusing people of
trying unintentionally to mislead others. It also prevents people
from sub-consciously _fooling themselves_.

Will there be mistakes? Sure, and managing mistakes is the _point_
of risk analysis so why would we want to claim that the risk
assessment is perfect? There are mistakes everywhere in the kernel,
and the only way to tackle them is to have a clear idea about them.
Human mistakes fundamentally affect the quality-mapping system too -
and analyzing that is an important part of quality analysis.

Can they be relied on? They can be relied on the same way all
written down words can be relied on that accompany code: it depends
on how much i trust the person who wrote it and it depends on the
actual track record of that code. So it can be 99% trust or a very
low level of trust.

In the end, only reading the code will tell for sure. Sometimes not
even that.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/