Re: [PATCH v2 2/5] docs: automarkup.py: Fix regexes to solve sphinx 3 warnings

From: Nícolas F. R. A. Prado
Date: Wed Oct 14 2020 - 16:09:26 EST


On Wed Oct 14, 2020 at 4:11 PM -03, Jonathan Corbet wrote:
>
> On Tue, 13 Oct 2020 23:13:17 +0000
> Nícolas F. R. A. Prado <nfraprado@xxxxxxxxxxxxxx> wrote:
>
> > The warnings were caused by the expressions matching words in the
> > translated versions of the documentation, since any unicode character
> > was matched.
> >
> > Fix the regular expression by making the C regexes use ASCII
>
> I don't quite understand this part, can you give an example of the kinds
> of warnings you were seeing?

Hi Jon,
sure.

One I had noted down was:

WARNING: Unparseable C cross-reference: '调用debugfs_rename'

which I believe occurred in the chinese translation.

I think the problem is that in chinese there normally isn't space between the
words, so even if I had made the regexes only match the beginning of the word
(which I didn't, but I fixed this in this patch with the \b), it would still try
to cross-reference to that symbol containing chinese characters, which is
unparsable to sphinx.

So since valid identifiers in C are only in ASCII anyway, I used the ASCII flag
to make \w, and \d only match ASCII characters, otherwise they match any unicode
character.

If you want to have a look at other warnings or more complete output let me know
and I will recompile those versions. That sentence was the only thing I noted
down, but I think it gives a good idea of the problem.

Thanks,
Nícolas

>
> Thanks,
>
> jon