Re: [PATCH] checkpatch: fix false positive for REPEATED_WORD warning

From: Joe Perches
Date: Wed Oct 21 2020 - 11:19:02 EST


On Wed, 2020-10-21 at 20:31 +0530, Aditya Srivastava wrote:
> Presence of hexadecimal address or symbol results in false warning
> message by checkpatch.pl.
>
> For example, running checkpatch on commit b8ad540dd4e4 ("mptcp: fix
> memory leak in mptcp_subflow_create_socket()") results in warning:
>
> WARNING:REPEATED_WORD: Possible repeated word: 'ff'
> 00 00 00 00 00 00 00 00 00 2f 30 0a 81 88 ff ff ........./0.....

Right.

> To avoid all such reports, add an additional regex check for a repeating
> pattern of 4 or more 2-lettered words separated by space in a line.

> A quick evaluation on v5.6..v5.8 showed that this fix reduces
> REPEATED_WORD warnings from 2797 to 1043.

Are many of the other 1043 false positives?
Any pattern to them?

> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
[]
> @@ -3050,8 +3050,10 @@ sub process {
> }
> }
>
> -# check for repeated words separated by a single space
> - if ($rawline =~ /^\+/ || $in_commit_log) {
> +# check for repeated words separated by a single space and
> +# avoid repeating hex occurrences like 'ff ff fe 09 ...'
> + if (($rawline =~ /^\+/ || $in_commit_log) &&
> + $rawline !~ /(\b[0-9a-f]{2}( )+){4,}/) {

This might be better as \b$Hex to avoid FF FF
and FFFFFFFF FFFFFFFF

I might add that check to the line below where
the repeated words are checked against long
---
scripts/checkpatch.pl | 1 +
1 file changed, 1 insertion(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index fab38b493cef..929866999f81 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3062,6 +3062,7 @@ sub process {

next if ($first ne $second);
next if ($first eq 'long');
+ next if ($first =~ /^$Hex$/;

if (WARN("REPEATED_WORD",
"Possible repeated word: '$first'\n" . $herecurr) &&