Re: [BUG] checkpatch: false positive for commits with quote characters

From: Brian Norris
Date: Thu Dec 03 2015 - 20:39:58 EST


On Thu, Dec 03, 2015 at 04:36:28PM -0800, Joe Perches wrote:
> On Thu, 2015-12-03 at 16:29 -0800, Joe Perches wrote:
> > On Thu, 2015-12-03 at 16:13 -0800, Brian Norris wrote:
> > > Ping? I've hit some different false positives today on the same rule.
> > > I'll stop bothering to report them if no one cares.
> >
> > Perhaps this:

Aside from the non-breaking spaces your copy of Evolution inserted, it's
an improvement. (It doesn't give a false positive for the patch I
reported.) But it's still got some holes.

> (minus the debugging this time...)
> ---
>  scripts/checkpatch.pl | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index d4960f7..b23dff8 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -2397,20 +2397,20 @@ sub process {
>   $long = 1 if ($line =~ /\bcommit\s+[0-9a-f]{41,}/i);
>   $space = 0 if ($line =~ /\bcommit [0-9a-f]/i);
>   $case = 0 if ($line =~ /\b[Cc]ommit\s+[0-9a-f]{5,40}[^A-F]/);
> - if ($line =~ /\bcommit\s+[0-9a-f]{5,}\s+\("([^"]+)"\)/i) {
> + if ($line =~ /\bcommit\s+[0-9a-f]{5,}\s+\("(.*)"\)/i) {

You're got the reverse problem now. This will match too broadly. For
instance, if there is another instance of terminating quote +
parentheses on the same line, then we'll get a false postive. e.g., if I
wrote a patch that included something like this:

Commit 6dc0dcde406b ("parisc: Use platform_device_register_simple("rtc-generic")") is cool (as in "cool")

Or even if it's wrapped like this:

Commit 6dc0dcde406b ("parisc: Use
platform_device_register_simple("rtc-generic")") is cool (as in "cool")

Then the regexes will think the description was:

parisc: Use platform_device_register_simple("rtc-generic")") is cool (as in "cool

A hacky workaround for that one: only check for (loosely, not proper
regex syntax):

parsed_description =~ description . ")";

rather than:

description == parsed_description

Not sure how far you want to go on chasing false positives...

>   $orig_desc = $1;
>   $hasparens = 1;
>   } elsif ($line =~ /\bcommit\s+[0-9a-f]{5,}\s*$/i &&
>    defined $rawlines[$linenr] &&
> -  $rawlines[$linenr] =~ /^\s*\("([^"]+)"\)/) {
> +  $rawlines[$linenr] =~ /^\s*\("(.*)"\)/) {
>   $orig_desc = $1;
>   $hasparens = 1;
>   } elsif ($line =~ /\bcommit\s+[0-9a-f]{5,}\s+\("[^"]+$/i &&
>    defined $rawlines[$linenr] &&
> -  $rawlines[$linenr] =~ /^\s*[^"]+"\)/) {
> +  $rawlines[$linenr] =~ /^\s*.*"\)/) {
>   $line =~ /\bcommit\s+[0-9a-f]{5,}\s+\("([^"]+)$/i;
>   $orig_desc = $1;
> - $rawlines[$linenr] =~ /^\s*([^"]+)"\)/;
> + $rawlines[$linenr] =~ /^\s*(.*)"\)/;
>   $orig_desc .= " " . $1;
>   $hasparens = 1;
>   }

BTW, another false positive: just including this text in a commit
triggers a different one:

http://lkml.kernel.org/g/20151113220039.GA74382@xxxxxxxxxx

A simple hack (appended, in addition to yours) would be to assume that
when people are trying to include badly-formatted commit hashes, they
will be preceding them with whitespace, the beginning of a line, or
encapsulating punctuation. Or use a URL parser.

Brian

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index d2993f19df3f..e7110ba3242b 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2349,7 +2349,7 @@ sub process {
# Check for git id commit length and improperly formed commit descriptions
if ($in_commit_log && !$commit_log_possible_stack_dump &&
($line =~ /\bcommit\s+[0-9a-f]{5,}\b/i ||
- ($line =~ /\b[0-9a-f]{12,40}\b/i &&
+ ($line =~ /\b[\s^\("][0-9a-f]{12,40}\b/i &&
$line !~ /[\<\[][0-9a-f]{12,40}[\>\]]/i &&
$line !~ /\bfixes:\s*[0-9a-f]{12,40}/i))) {
my $init_char = "c";
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/