Re: [PATCH] get_maintainer: correctly parse UTF-8 encoded names in files

From: Alvin Šipraga
Date: Thu Dec 14 2023 - 09:57:55 EST


On Wed, Dec 13, 2023 at 05:41:59PM -0800, Linus Torvalds wrote:
> On Wed, 13 Dec 2023 at 17:06, Alvin Šipraga <ALSI@xxxxxxxxxxxxxxx> wrote:
> >
> > Sorry to be a nuisance, but could you please have another look below and
> > reconsider this patch? Otherwise NAK is fine, but I wanted to follow up
> > on this as it solves an actual, albeit minor, issue for people with
> > unusual names when sending and receiving patches.
>
> The patch seems bogus, because it shouldn't have any "Latin" encoding
> issues at all.
>
> Opening as utf8 makes sense, but the "Latin" part of the regular
> expressions seem bogus.
>
> IOW, isn't '\p{L}' the right pattern for a "letter"? Isn't that what
> we actually care about here?

Yes, you have a point, I was being too conservative with the choice of
'\p{Latin}'. I will send a v2 using '\p{L}'.

>
> Replacing one locale bug with just another locale bug seems pointless.

Thanks for the review!

Kind regards,
Alvin