Re: [PATCH 1/3] scripts: add spelling_sanitizer.sh script

From: Leizhen (ThunderTown)
Date: Tue Jun 22 2021 - 04:47:36 EST




On 2021/6/16 19:58, Leizhen (ThunderTown) wrote:
>
>
> On 2021/6/15 15:01, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2021/6/11 23:36, Joe Perches wrote:
>>> On Fri, 2021-06-11 at 15:12 +0800, Zhen Lei wrote:
>>>> The file scripts/spelling.txt recorded a large number of
>>>> "mistake||correction" pairs. These entries are currently maintained in
>>>> order, but the results are not strict. In addition, when someone wants to
>>>> add some new pairs, he either sort them manually or write a script, which
>>>> is clearly a waste of labor.
>>>
>>> Try using lintian's make sort
>>>
>>> https://salsa.debian.org/lintian/lintian
>
> I installed lintian and found no option to support sort. Can anyone give me more
> specific instructions on how to use it?
>
> Although I don't understand the perl language, after reading commit 66b47b4a9dad
> ("checkpatch: look for common misspellings"), it seems to match from top to bottom.
> So, as Andy Shevchenko says, they should be sorted by frequency of the word usage.
>
> I really don't know the details of the implementation of
> scripts/checkpatch.pl --types=typo_spelling. Are only misspelled words involved in
> spelling.txt matching? Otherwise, if correctly spelled words are also traversed,
> sorting by frequency makes no sense. Because the correct number of words is far more
> than the wrong number of words. If that's the case, then my modified script could
> come in handy.
>
> And if only misspelled words involved in spelling.txt matching, do we really need
> spelling.txt? Just output the misspelled words is enough. I don't think anyone needs
> to follow the tips to complete the fix.

Hi all:
I did a little test:
git rm -r drivers/usb --> then revert to generate patch 'usb, 553988 insertions(+)
git rm -r mm/ --> then revert to generate patch 'mm', 157606 insertions(+)

Two Stages(Test twice each, unit: seconds):
Before sorted by this patch:
mm 264 264
usb 1049 1047

After sorted by this patch:
mm 264 265
usb 1047 1045

According to the test results, the performance before and after sorting is basically the same.

The test method is as follows:
start=$(date +%s)
scripts/checkpatch.pl --types=TYPO_SPELLING 0001-Revert-usb-remove.patch > /dev/null
end=$(date +%s)
seconds=$((end - start))
echo $seconds


>
>>>
>>>
>>
>> Okay, I'll try it
>>
>>>
>>> .
>>>