Re: [PATCH v2] checkpatch: extend author Signed-off-by check for split From: header

From: Joe Perches
Date: Sun Sep 20 2020 - 12:54:49 EST


On Sun, 2020-09-20 at 21:52 +0530, Dwaipayan Ray wrote:
> On Sun, Sep 20, 2020 at 8:39 PM Joe Perches <joe@xxxxxxxxxxx> wrote:
> > On Sun, 2020-09-20 at 14:47 +0530, Dwaipayan Ray wrote:
> > > Checkpatch did not handle cases where the author From: header
> > > was split into multiple lines. The author identity could not
> > > be resolved and checkpatch generated a false NO_AUTHOR_SIGN_OFF
> > > warning.
> >
> I think there won't be any problem. Is my
> observation correct?

Likely true, it probably doesn't matter.
Still, I'd imagine it doesn't hurt either.

> > What I have does a bit more by saving any post-folding
> >
> > "From: <name and email address>"
> >
> > and comparing that to any "name and perhaps different
> > email address" in a Signed-off-by: line.
> >
> > A new message is emitted if the name matches but the
> > email address is different.
> >
> > Perhaps it's reasonable to apply your patch and then
> > update it with something like the below:
> > ---
> > scripts/checkpatch.pl | 32 ++++++++++++++++++++++++++++----
> > 1 file changed, 28 insertions(+), 4 deletions(-)
> >
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > index 3e474072aa90..1ecc179e938d 100755
> > --- a/scripts/checkpatch.pl
> > +++ b/scripts/checkpatch.pl
> > @@ -1240,6 +1240,15 @@ sub same_email_addresses {
> > $email1_address eq $email2_address;
> > }
> >
> > +sub same_email_names {
> > + my ($email1, $email2) = @_;
> > +
> > + my ($email1_name, $name1_comment, $email1_address, $comment1) = parse_email($email1);
> > + my ($email2_name, $name2_comment, $email2_address, $comment2) = parse_email($email2);
> > +
> > + return $email1_name eq $email2_name;
> > +}
> > +
> > sub which {
> > my ($bin) = @_;
> >
> > @@ -2679,20 +2688,32 @@ sub process {
> > }
> >
> > # Check the patch for a From:
> > - if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
> > + if ($line =~ /^From:\s*(.*)/i) {
> > $author = $1;
> > - $author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
> > + my $curline = $linenr;
> > + while (defined($rawlines[$curline]) && $rawlines[$curline++] =~ /^\s(\s+)?(.*)/) {
> > + $author .= ' ' if (defined($1));
> > + $author .= "$2";
> > + }
> > + if ($author =~ /=\?utf-8\?/i) {
> > + $author = decode("MIME-Header", $author);
> > + $author = encode("utf8", $author);
> > + }
> > +
> > $author =~ s/"//g;
> > $author = reformat_email($author);
> > }
> >
> > # Check the patch for a signoff:
> > if ($line =~ /^\s*signed-off-by:\s*(.*)/i) {
> > + my $sig = $1;
> > $signoff++;
> > $in_commit_log = 0;
> > if ($author ne '') {
> > - if (same_email_addresses($1, $author)) {
> > - $authorsignoff = 1;
> > + if (same_email_addresses($sig, $author)) {
> > + $authorsignoff = "1";
> > + } elsif (same_email_names($sig, $author)) {
> > + $authorsignoff = $sig;
> > }
> > }
> > }
> > @@ -6937,6 +6958,9 @@ sub process {
> > } elsif (!$authorsignoff) {
> > WARN("NO_AUTHOR_SIGN_OFF",
> > "Missing Signed-off-by: line by nominal patch author '$author'\n");
> > + } elsif ($authorsignoff ne "1") {
> > + WARN("NO_AUTHOR_SIGN_OFF",
> > + "From:/SoB: email address mismatch: 'From: $author' != 'Signed-off-by: $authorsignoff'\n");
> > }
> > }
> >
> >
>
> Yes, this is definitely more logical !
> I was actually hoping to talk with you on this.

Hope granted, but only via email... (smile)

> The code you sent better handles name mismatches when
> email addresses are same. But I also have found several
> such commits in which the author have signed off using
> a different email address than the one which he/she used
> to send the patch.
>
> For example, Lukas checked commits between v5.4 and
> v5.8 and he found:
> 175 Missing Signed-off-by: line by nominal patch authorrep
> 'Daniel Vetter <daniel.vetter@xxxxxxxx>'
>
> Infact in all of those commits he signed off using a different
> mail, Daniel Vetter <daniel.vetter@xxxxxxxxx>.
>
> So is it possible to resolve these using perhaps .mailmap
> entries? Or should only the name mismatch part be better
> handled? Or perhaps both?

Dunno. It certainly can be improved...
Try adding some more logic and see what you come up with.

btw:

The most frequent NO_AUTHOR_SIGN_OFF messages for v5.7..v5.8 are

98 WARNING: From:/SoB: email address mismatch: 'From: Daniel Vetter <daniel.vetter@xxxxxxxx>' != 'Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxxx>'
24 WARNING: From:/SoB: email address mismatch: 'From: Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx>' != 'Signed-off-by: Thinh Nguyen <thinhn@xxxxxxxxxxxx>'
19 WARNING: From:/SoB: email address mismatch: 'From: Wolfram Sang <wsa+renesas@xxxxxxxxxxxxxxxxxxxx>' != 'Signed-off-by: Wolfram Sang <wsa@xxxxxxxxxx>'
11 WARNING: From:/SoB: email address mismatch: 'From: Luke Nelson <lukenels@xxxxxxxxxxxxxxxxx>' != 'Signed-off-by: Luke Nelson <luke.r.nels@xxxxxxxxx>'
8 WARNING: From:/SoB: email address mismatch: 'From: Christophe Leroy <christophe.leroy@xxxxxx>' != 'Signed-off-by: Christophe Leroy <christophe.leroy@xxxxxxxxxx>'
6 WARNING: From:/SoB: email address mismatch: 'From: Davidlohr Bueso <dave@xxxxxxxxxxxx>' != 'Signed-off-by: Davidlohr Bueso <dbueso@xxxxxxx>'
5 WARNING: Missing Signed-off-by: line by nominal patch author '"Paul A. Clarke" <pc@xxxxxxxxxx>'
4 WARNING: Missing Signed-off-by: line by nominal patch author 'Huang Ying <ying.huang@xxxxxxxxx>'
3 WARNING: Missing Signed-off-by: line by nominal patch author '"Stephan Müller" <smueller@xxxxxxxxxx>'

For the Missing Signed-off-by: lines above,
even after decoding, the email matches but
the names do not.

From: "Paul A. Clarke" <pc@xxxxxxxxxx>
[...]
Signed-off-by: Paul Clarke <pc@xxxxxxxxxx>

From: Huang Ying <ying.huang@xxxxxxxxx>
[...]
Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx>

From: =?UTF-8?q?Stephan=20M=C3=BCller?= <smueller@xxxxxxxxxx>
[...]
Signed-off-by: Stephan Mueller <smueller@xxxxxxxxxx>

> Also, I would like to know if there are any more changes
> required for the current patch or if it is good to go?

I think it's fine.

cheers, Joe