Re: [PATCH v2] checkpatch: fix false positives in REPEATED_WORD warning

From: Aditya
Date: Fri Oct 23 2020 - 02:33:39 EST


On 23/10/20 4:16 am, Joe Perches wrote:
> On Fri, 2020-10-23 at 02:35 +0530, Aditya wrote:
>> On 23/10/20 1:03 am, Joe Perches wrote:
>>> On Fri, 2020-10-23 at 00:44 +0530, Aditya wrote:
>>>> On 22/10/20 9:40 pm, Joe Perches wrote:
>>>>> On Thu, 2020-10-22 at 20:20 +0530, Aditya Srivastava wrote:
>>>>>> Presence of hexadecimal address or symbol results in false warning
>>>>>> message by checkpatch.pl.
>>>>> []
>>>>>> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
>>>>> []
>>>>>> @@ -3051,7 +3051,10 @@ sub process {
>>>>>> }
>>>>>>
>>>>>> # check for repeated words separated by a single space
>>>>>> - if ($rawline =~ /^\+/ || $in_commit_log) {
>>>>>> +# avoid false positive from list command eg, '-rw-r--r-- 1 root root'
>>>>>> + if (($rawline =~ /^\+/ || $in_commit_log) &&
>>>>>> + $rawline !~ /[bcCdDlMnpPs\?-][rwxsStT-]{9}/) {
>>>>>
>>>>> Alignment and use \b before and after the regex please.
>>>>
>>>> If we use \b either before or after or both it does not match patterns
>>>> such as:
>>>> + -rw-r--r--. 1 root root 112K Mar 20 12:16'
>>> selinux-policy-3.14.4-48.fc31.noarch.rpm
>>>
>>> OK, thanks, it's good you checked.
>>>
>>>>> []
>>>>> What does all this code actually avoid?
>>>>
>>>> Sir, there are multiple variations of hex for which this warning is
>>>> occurring, for eg:
>>>> 1) 00 c0 06 16 00 00 ff ff 00 93 1c 18 00 00 ff ff ................
>>>> 2) ffffffff ffffffff 00000000 c070058c
>>>> 3) f5a: 48 c7 44 24 78 ff ff movq
>>>> $0xffffffffffffffff,0x78(%rsp)
>>>> 4) + fe fe
>>>> 5) + fe fe - ? end marker ?
>>>> 6) Code: ff ff 48 (...)
>>>
>>> So why not just match first with /^[0-9a-f]+$/i ?
>>>
>>> Doesn't that match all the cases listed above?
>>>
>>>
>>
>> Then, we'll not be able to account for cases such as:
>>
>> 1) + * sets this to -1, the slack value will be calculated to be be
>> halfway [For 'be' 'be']
>> 2) + * @seg: index of packet segment whose raw fields are to be be
>> extracted [For 'be' 'be']
>> 3) Let's also add add a note about using only the l3 access without l4
>> [For 'add' 'add']
>
> Like the use of long, I think you're better off with
> either a list or hash of specific words to ignore.
>
>
>

Ok Sir. Actually, I was proceeding with this approach initially. But
it has some side affects like it won't be able to detect hex such as:

'ff 4d be be be 9f 04 48'

Here although 'be' is part of hex sequence, it will be reported in the
warning.
However, though such cases haven't occurred over v6..v8.

Actually, we could simplify it more on the basis of occurrences in
v6..v8, for eg. if we check for /[0-9c-f][0-9a-f]+/ (instead of
/[0-9a-f]{2,}/), it gives us desired result over v6..v8, but again
we'll miss out cases such as:

'ff af af bc bc ba ba bd bd'

To counter all these, I had implemented the solution in the proposed
way. But, I'll change it as you suggested. The advantage of it can be
that the code complexity is less and the negatives will be few.

Thanks
Aditya