Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code

From: Toshi Kani
Date: Mon Mar 14 2016 - 14:56:03 EST


On Sat, 2016-03-12 at 17:18 +0100, Ingo Molnar wrote:
> * Toshi Kani <toshi.kani@xxxxxxx> wrote:
>
> > On Fri, 2016-03-11 at 09:13 +0000, Ingo Molnar wrote:
> > > * Ingo Molnar <mingo@xxxxxxxxxx> wrote:
> > >
> > > >
> > > > * Toshi Kani <toshi.kani@xxxxxxx> wrote:
> > > >
> > > > > MTRR manages PAT initialization as it implements a rendezvous
> > > > > handler that initializes PAT as part of MTRR initialization.
> > > > >
> > > > > When CPU does not support MTRR, ex. qemu32 virtual CPU, MTRR
> > > > > simply skips PAT init, which causes PAT left enabled without
> > > > > initialization. [...]
> > > >
> > > > What practical effects does this have to the user? Does the kernel
> > > > crash?
> > >
> > > Btw., I find this omission _highly_ annoying: describing what
> > > negative effects a bug _causes in practice_ is the most important
> > > part of a changelog. How on earth can an experienced contributor omit
> > > such an important component from a patch description?
> > >
> > > Most readers of changelogs couldn't care less about technical details
> > > of how the bug is fixed (of course others will read it so it's nice
> > > to have too), but what symptoms a bug causes, how serious is it,
> > > whether it should be backported are like super important compared to
> > > everything else you wrote - and both the description and the
> > > changelogs are totally silent on those topics ...
> > >
> > > I've seen this in other PAT patches - please try to improve this.
> >
> > My apology. I agree the importance of describing the negative effect of
> > the issue. This case is complicated to describe thoroughly, but here is
> > a summary.
>
> The new changelog looks very good, thanks!
>
> > The issue was reported as a regression caused by 'commit 9cd25aac1f44
> > ("x86/mm/pat: Emulate PAT when it is disabled")'. So, the goal of this
> > patchset is to fix this regression.
> > https://lkml.org/lkml/2016/3/3/828
>
> So one thing that matters more than anything else in the changelog, the
> title! Right now the title is:
>
> Â x86/mtrr: Refactor PAT initialization code
>
> ... that's a nice title for a true refactoring of the code, but this
> isn't really that, the purpose of this fix is to fix a bad Xorg crash for
> Qemu users.
>
> The principle you need to remember is that readers of your changelogs
> will be _very happy_ about 'negative' phrases like:
>
> Â bad bug
> Â Xorg crash
> Â boot failure
> Â kernel crash
> Â NULL dereference
>
> I.e. the 'best' title for a bug fix is to characterize it in the most
> negative truthful fashion in the changelog. It sounds a bit
> counterintuitive but it's true.Â
>
> So in this case the best changelog title would be something like:
>
> Â x86/pat: Fix Xorg crashes in Qemu sessions
>
> People will absolutely _love_ such titles, because:
>
> Â - users who are trying to find mysterious Xorg failures can grep for it
> and might find it before it hits a stable kernel they are using
>
> Â - maintainers (like me) are able to see it at a glance that this fix
> should go to Linus more urgently than other fixes. (and definitely more
> urgently than feature patches.)
>
> Â - stable kernel maintainers and distro backporters can see it
> immediately at a glance that they really want this fix.
>
> So by being intentionally and maximally negative in the title, you are
> being very helpful to your fellow developers and users!
>
> Now consider the original title:
>
> Â x86/mtrr: Refactor PAT initialization code
>
> 99% of people will glance over such a title, which is not good.
> Furhermore, maintainers like me will get _annoyed_ at such titles,
> because this neutrally formulated title, while very polite, actively
> hides the important detail that these patches fix real negative bugs for
> real users.
>
> Okay?

Thanks for all the explanation and guidance! That's very helpful. Yes, I
will keep this in mind.

> And please also note that in the Linux kernel no-one ever 'blames' other
> people for bugs. Bugs are part of the human condition and they happen all
> the time as long as they are not introduced by carelessness. So in the
> typical case you cannot possibly socially embarrass any good kernel
> developer by reporting and fixing a bug he introduced. The typical
> reaction you will get is 'oh great, one bug less to worry about!', so
> socially you can be absolutely honest and 'impolite' about the negative
> effects of bugs.

Understood.

> > The negative effects of the issue were two failures in Xorg on qemu32
> > env, which was triggered by the fact that its virtual CPU does not
> > support MTRR.
> > https://lkml.org/lkml/2016/3/4/775
> > Â#1. copy_process() failed in the check in reserve_pfn_range()
> > Â#2. error path in copy_process() then hit WARN_ON_ONCE in
> > untrack_pfn().
>
> Yeah, it's nice to quote actual crash signatures as well (in a short
> form) - because people hitting the crashes often do a google search and
> might find the fix based on such patterns.

Will do.

Thanks!
-Toshi