Re: [PATCH v2] checkpatch: fix false positives in REPEATED_WORD warning

From: Joe Perches
Date: Thu Oct 22 2020 - 18:47:07 EST


On Fri, 2020-10-23 at 02:35 +0530, Aditya wrote:
> On 23/10/20 1:03 am, Joe Perches wrote:
> > On Fri, 2020-10-23 at 00:44 +0530, Aditya wrote:
> > > On 22/10/20 9:40 pm, Joe Perches wrote:
> > > > On Thu, 2020-10-22 at 20:20 +0530, Aditya Srivastava wrote:
> > > > > Presence of hexadecimal address or symbol results in false warning
> > > > > message by checkpatch.pl.
> > > > []
> > > > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > > []
> > > > > @@ -3051,7 +3051,10 @@ sub process {
> > > > > }
> > > > >
> > > > > # check for repeated words separated by a single space
> > > > > - if ($rawline =~ /^\+/ || $in_commit_log) {
> > > > > +# avoid false positive from list command eg, '-rw-r--r-- 1 root root'
> > > > > + if (($rawline =~ /^\+/ || $in_commit_log) &&
> > > > > + $rawline !~ /[bcCdDlMnpPs\?-][rwxsStT-]{9}/) {
> > > >
> > > > Alignment and use \b before and after the regex please.
> > >
> > > If we use \b either before or after or both it does not match patterns
> > > such as:
> > > + -rw-r--r--. 1 root root 112K Mar 20 12:16'
> > selinux-policy-3.14.4-48.fc31.noarch.rpm
> >
> > OK, thanks, it's good you checked.
> >
> > > > []
> > > > What does all this code actually avoid?
> > >
> > > Sir, there are multiple variations of hex for which this warning is
> > > occurring, for eg:
> > > 1) 00 c0 06 16 00 00 ff ff 00 93 1c 18 00 00 ff ff ................
> > > 2) ffffffff ffffffff 00000000 c070058c
> > > 3) f5a: 48 c7 44 24 78 ff ff movq
> > > $0xffffffffffffffff,0x78(%rsp)
> > > 4) + fe fe
> > > 5) + fe fe - ? end marker ?
> > > 6) Code: ff ff 48 (...)
> >
> > So why not just match first with /^[0-9a-f]+$/i ?
> >
> > Doesn't that match all the cases listed above?
> >
> >
>
> Then, we'll not be able to account for cases such as:
>
> 1) + * sets this to -1, the slack value will be calculated to be be
> halfway [For 'be' 'be']
> 2) + * @seg: index of packet segment whose raw fields are to be be
> extracted [For 'be' 'be']
> 3) Let's also add add a note about using only the l3 access without l4
> [For 'add' 'add']

Like the use of long, I think you're better off with
either a list or hash of specific words to ignore.