[PATCH] checkpatch: fix false positive for REPEATED_WORD warning

From: Aditya Srivastava
Date: Wed Oct 21 2020 - 11:01:45 EST


Presence of hexadecimal address or symbol results in false warning
message by checkpatch.pl.

For example, running checkpatch on commit b8ad540dd4e4 ("mptcp: fix
memory leak in mptcp_subflow_create_socket()") results in warning:

WARNING:REPEATED_WORD: Possible repeated word: 'ff'
00 00 00 00 00 00 00 00 00 2f 30 0a 81 88 ff ff ........./0.....

Here, it reports 'ff' to be repeated, but it is in fact part of some
address or code, where it has to be repeated.
In this case, the intent of the warning to find stylistic issues in
commit messages is not met and the warning is just completely wrong in
this case.

To avoid all such reports, add an additional regex check for a repeating
pattern of 4 or more 2-lettered words separated by space in a line.

A quick evaluation on v5.6..v5.8 showed that this fix reduces
REPEATED_WORD warnings from 2797 to 1043.

A quick manual check found all cases are related to hex output in
commit messages.

Signed-off-by: Aditya Srivastava <yashsri421@xxxxxxxxx>
---
scripts/checkpatch.pl | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 9b9ffd876e8a..78aeb7a3ca3d 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3050,8 +3050,10 @@ sub process {
}
}

-# check for repeated words separated by a single space
- if ($rawline =~ /^\+/ || $in_commit_log) {
+# check for repeated words separated by a single space and
+# avoid repeating hex occurrences like 'ff ff fe 09 ...'
+ if (($rawline =~ /^\+/ || $in_commit_log) &&
+ $rawline !~ /(\b[0-9a-f]{2}( )+){4,}/) {
while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {

my $first = $1;
--
2.17.1