Re: 463 kernel developers missing!

From: Jon Smirl
Date: Mon Jul 28 2008 - 18:08:45 EST


On 7/28/08, Dave Jones <davej@xxxxxxxxxx> wrote:
> On Mon, Jul 28, 2008 at 04:22:36PM -0400, Theodore Tso wrote:
> > On Mon, Jul 28, 2008 at 03:00:13PM -0400, Jon Smirl wrote:
> > > Other people aren't perfect, I've found over 1,000 typos in the those
> > > names and emails. We need a validation mechanism.
> > >
> >
> > You keep using the word "need"; I do not think it means what you think
> > it does. :-)
> >
> > Seriously, why is it so important? It's a nice to have, and I
> > recognize that you've spent a bunch of time on it. But if the goal is
> > to get better statistics, and in exchange we forcibly map all Mark
> > Browns to one e-mail address, and/or force them to all adopt middle
> > initials (what if there are two Dan Smith's that don't have middle
> > initials) just for the convenience of your statistics gathering, I
> > would gently suggest to you that you've forgotten which is the tail,
> > and which is the dog.
>
>
> I'm beginning to question just how useful the continued measuring
> of things like Signed-off-by's is. Last week at OLS, I overheard
> a conversation where someone was talking about the "top 10" lists
> that Greg has been talking about at various conferences.
> The conversation went along the lines of "my manager really wants
> to see us on that list, at any cost".

I didn't do this to measure statistics, I did it because I was writing
a script and the script was getting garbage for input. It just had the
side effect of cleaning up the statistics.

> Whilst the naive may think 'more patches == more better', this isn't
> necessarily the case given we have nowhere near enough review bandwidth
> *now*, and flooding with a zillion trivial patches really isn't going
> to make that job any easier.
>
> Getting patches into the tree is easy, we've proven that.
> As things stand now, it's also fairly easy to 'game' the system
> by committing something in 10 changesets when it could be done
> just as easily in 2-3.
>
> How about we start measuring things that actually matter, like..
>
> "How many patches were reviewed before they went in"
> "How many patches were directly responsible for a bug"
> "How many patches actually fixed something anyone cares about"
> "How many patches are responsible for just 'churn'"
>

These are good topics for the Plumbers conference. But to ask these
questions we need to get the data into a format where a computer can
process it. Syntax checking, validation, etc are needed on the log
messages. I'm not going to hunt through 100,000 commits trying to
answer these by hand.

Another fun experiment would be to load an archive of LKML, kernel
bugzilla and the kernel source history into git and then try to link
everything together. The cleaner the data is, the easier it will be to
link things. How about a GUI where each patch is annotated with a link
to the email thread discussing it?

--
Jon Smirl
jonsmirl@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/