Re: [PATCH 0/3] scripts/spelling.txt: add some spelling pairs and reorder

From: Andy Shevchenko
Date: Fri Jun 11 2021 - 04:11:50 EST


On Fri, Jun 11, 2021 at 11:02 AM Andy Shevchenko
<andy.shevchenko@xxxxxxxxx> wrote:
> On Fri, Jun 11, 2021 at 10:19 AM Zhen Lei <thunder.leizhen@xxxxxxxxxx> wrote:
> >
> > Add spelling_sanitizer.sh and use it to reorder, then add some spelling
> > "mistake||correction" pairs.
>
> The sorting idea is good, but the order is not.
> What you really need is to use language corpus [1] instead. So in such
> case you will eliminate false positives (to some extent).

Perhaps I need to elaborate what I meant. The (important) feature of
the corpus is sorting by frequency of the word usage. That's what
would be the best. Unfortunately I don't know if codespell uses linear
search or hash based (i.o.w. does it convert the input file to the
Python list() or set() object?).

> [1]: https://en.wikipedia.org/wiki/Corpus_of_Contemporary_American_English


--
With Best Regards,
Andy Shevchenko