Re: [PATCH] checkpatch: fix false positive for REPEATED_WORD warning

From: Joe Perches
Date: Wed Oct 21 2020 - 15:26:24 EST


On Thu, 2020-10-22 at 00:40 +0530, Aditya wrote:
> On 21/10/20 8:48 pm, Joe Perches wrote:
> > On Wed, 2020-10-21 at 20:31 +0530, Aditya Srivastava wrote:
> > > Presence of hexadecimal address or symbol results in false warning
> > > message by checkpatch.pl.
> > >
> > > For example, running checkpatch on commit b8ad540dd4e4 ("mptcp: fix
> > > memory leak in mptcp_subflow_create_socket()") results in warning:
> > >
> > > WARNING:REPEATED_WORD: Possible repeated word: 'ff'
> > > 00 00 00 00 00 00 00 00 00 2f 30 0a 81 88 ff ff ........./0.....
> >
> > Right.
> >
> > > To avoid all such reports, add an additional regex check for a repeating
> > > pattern of 4 or more 2-lettered words separated by space in a line.
> > > A quick evaluation on v5.6..v5.8 showed that this fix reduces
> > > REPEATED_WORD warnings from 2797 to 1043.
> >
> > Are many of the other 1043 false positives?
> > Any pattern to them?
> >
> Apart from the changes suggested by Dwaipayan in
> https://lore.kernel.org/linux-kernel-mentees/20201017162732.152351-1-dwaipayanray1@xxxxxxxxx/
>
> The 'ls -l' output seems to be another common false positive for
> REPEATED_WORD (Frequency 106 over v5.6..v5.8). For eg.
>
> WARNING:REPEATED_WORD: Possible repeated word: 'root'
> #18:
> drwxr-xr-x. 2 root root 0 Apr 17 10:53 .
[]
> @@ -3050,8 +3050,10 @@ sub process {
> }
> }
>
> - if ($rawline =~ /^\+/ || $in_commit_log) {
> + if (($rawline =~ /^\+/ || $in_commit_log) &&
> + $rawline !~ /\b[a-z-]+.* \d{1,3} [a-zA-Z]+ \w+ +\d+ \w{3} \d{1,2}
> \d{1,2}:\d{1,2}/) {

Perhaps a regex for permissions is good enough

$line !~ /\b[cbdl-][rwxs-]{9,9}\b/