Re: 463 kernel developers missing!

From: Jon Smirl
Date: Mon Jul 28 2008 - 20:14:33 EST


On 7/28/08, Paul Mundt <lethal@xxxxxxxxxxxx> wrote:
> On Tue, Jul 29, 2008 at 08:01:09AM +1000, James Morris wrote:
> > On Mon, 28 Jul 2008, Randy Dunlap wrote:
> >
> > > It would be Good if we could give more value to Reviewed-by: tag lines also...
> > >
> > > IOW, we "need" to do this. :)
> >
> > Also, Tested-by:, to encourage and recognize people who may not be
> > confident in reviewing code to at least test it, which is immensely
> > useful if done thoughtfully.
> >
> > "Measuring programming progress by lines of code is like measuring
> > aircraft building progress by weight."
> >
> > If you know who said this, award yourself a cookie :-)
> >
>
> Or just filter on "-by:", which seems to get anything relevant, including
> people that shamelessly make up their own tags. In order for something to
> be converted from a Cc: to a *-by: requires manual effort at least, which
> ought to be sufficient for recognition.
>
> If someone was really bored they could probably make a table of tags with
> various points to try and balance things slightly more objectively.
> Though it seems we now at least have totally different metrics on LWN,
> for the kernel summit selection process, and Jon's new script. ;-)
>
> Trying to map all of the names seems pretty pointless though, most
> regular contributors contribute in a fairly consistent and sane manner,
> with the odd mismatch or typo here or there. It might make sense for
> anyone where there's a significant difference, but those are going to be
> corner cases.

12% of the name/email pairs are messed up. It's not all simple typos.
There is significant mangling of non ASCII charsets by people's tools
in the maintainer's chain of processing. Half of the time I don't
believe what the author is submitting is what is ending up in the log
due to mangling. It's a larger source of noise than typos.

All of these variations on email names are in the log. Humans can
identify these problems, it is much harder for a machine.

For example, where are these backslashes coming from?
Auke-Jan H Kok <auke-jan.h.kok@xxxxxxxxx>
Auke-Jan H Kok <auke\-jan.h.kok@xxxxxxxxx>
Auke-Jan H Kok <auke\\-jan.h.kok@xxxxxxxxx>
Auke-Jan H Kok <auke\\\-jan.h.kok@xxxxxxxxx>
Auke-Jan H Kok <sofar@xxxxxxxxxxxxxxxx>

Are the tools case sensitive or insensitive on email addresses? Some
are are some aren't, so I need these cases...
Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Al Viro <viro@xxxxxxxxxxxxxxxxxx>

Another problem is internal machine names...
David S. Miller <davem@xxxxxxxxxxxxxxxxxxxx>
David S. Miller <davem@xxxxxxxxxxxxx>
David S. Miller <davem@xxxxxxxxxxxxxxxxxxxxxx>
David S. Miller <davem@xxxxxxxxxxxxxxxxxxx>
David S. Miller <davem@xxxxxxxxxxxxxxxxxx>
David S. Miller <davem@xxxxxxxxxxxxxxxxxxx>
David S. Miller <davem@xxxxxxxxxxxxxxxxxxxx>
David S. Miller <davem@xxxxxxxxxxxxxxxxxx>

Or varying the email name...
Alexey Starikovskiy <alexey.y.starikovskiy@xxxxxxxxx>
Alexey Starikovskiy <alexey_y_starikovskiy@xxxxxxxxxxxxxxx>
Alexey Starikovskiy <alexey.y.starikovskiy@xxxxxxxxxxxxxxx>

Why do these all end in (none)?
Craig Hughes <craig@xxxxxxxx(none)>
Dave Neuer <dneuer@xxxxxxxx(none)>
David Brownell <david-b@xxxxxxxx(none)>
David Woodhouse <dwmw2@xxxxxxxx(none)>
Deepak Saxena <dsaxena@xxxxxxxx(none)>
Enrico Scholz <enrico.scholz@xxxxxxx(none)>

--
Jon Smirl
jonsmirl@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/