Re: [PATCH] checkpatch: fix false positive for REPEATED_WORD warning

From: Aditya
Date: Wed Oct 21 2020 - 13:56:07 EST


On 21/10/20 10:20 pm, Joe Perches wrote:
> On Wed, 2020-10-21 at 08:28 -0700, Joe Perches wrote:
>> On Wed, 2020-10-21 at 08:18 -0700, Joe Perches wrote:
>>> I might add that check to the line below where
>>> the repeated words are checked against long
>> []
>>> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
>> []
>>> @@ -3062,6 +3062,7 @@ sub process {
>>>
>>> next if ($first ne $second);
>>> next if ($first eq 'long');
>>> + next if ($first =~ /^$Hex$/;
>>
>> oops. with a close parenthesis added of course...
>
> That doesn't work as $Hex expects a leading 0x.
>
> But this does...
>
> The negative of this approach is it would also not emit
> a warning on these repeated words: (doesn't seem too bad)
>
> $ grep -P '^[0-9a-f]{2,}$' /usr/share/dict/words
> abed
> accede
> acceded
> ace
> aced
> ad
> add
> added
> baa
> baaed
> babe
> bad
> bade
> be
> bead
> beaded
> bed
> bedded
> bee
> beef
> beefed
> cab
> cabbed
> cad
> cede
> ceded
> dab
> dabbed
> dad
> dead
> deaf
> deb
> decade
> decaf
> deed
> deeded
> deface
> defaced
> ebb
> ebbed
> efface
> effaced
> fa
> facade
> face
> faced
> fad
> fade
> faded
> fed
> fee
> feed
> ---
> scripts/checkpatch.pl | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index fab38b493cef..79d7a4cba19e 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -3062,6 +3062,7 @@ sub process {
>
> next if ($first ne $second);
> next if ($first eq 'long');
> + next if ($first =~ /^[0-9a-f]+$/i);
>
> if (WARN("REPEATED_WORD",
> "Possible repeated word: '$first'\n" . $herecurr) &&
>
>
>

Hi Sir,
Thanks for your feedback. I ran a manual check using this approach
over v5.6..v5.8.
The negatives occurring with this approach are for the word 'be'
(Frequency 5) and 'add'(Frequency 1). For eg.

WARNING:REPEATED_WORD: Possible repeated word: 'be'
#278: FILE: drivers/net/ethernet/intel/ice/ice_flow.c:388:
+ * @seg: index of packet segment whose raw fields are to be be extracted

WARNING:REPEATED_WORD: Possible repeated word: 'add'
#21:
Let's also add add a note about using only the l3 access without l4

Apart from these, it works as expected. It also takes into account the
cases for multiple occurrences of hex, as you mentioned. For eg.

WARNING:REPEATED_WORD: Possible repeated word: 'ffff'
#15:
0x0040: ffff ffff ffff ffff ffff ffff ffff ffff

These cases were getting missed with my approach.

Also, it is able to detect warnings for hex sequences which are
occurring less than 4 times(frequency 2), for eg,

WARNING:REPEATED_WORD: Possible repeated word: 'ff'
#38:
Code: ff ff 48 (...)

I'll try to combine both methods and come up with a better approach.

Aditya