Re: [PATCH v2] unicode: don't write -1 after NUL terminator

From: Gabriel Krisman Bertazi
Date: Mon Nov 07 2022 - 09:45:33 EST


"Jason A. Donenfeld" <Jason@xxxxxxxxx> writes:

> If the intention is to overwrite the first NUL with a -1, s[strlen(s)]
> is the first NUL, not s[strlen(s)+1].

Hi Jason,

This code is part of the verification of the trie that done at the end
of utf8data generation. It is making sure the tree is not corrupted, by
ensuring that utf8byte doesn't see something past the correct end of the
string (the first NULL byte). Note it is not a bad memory access
either, since we guarantee to have allocated enough space.

So I think the code is correct as is. if you apply your patch and
regenerate utf8data.h_shipped, utf8byte will reach that -1 and fail the
verification.

> Cc: Gabriel Krisman Bertazi <krisman@xxxxxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Jason A. Donenfeld <Jason@xxxxxxxxx>
> ---
> fs/unicode/mkutf8data.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/unicode/mkutf8data.c b/fs/unicode/mkutf8data.c
> index bc1a7c8b5c8d..61800e0d3226 100644
> --- a/fs/unicode/mkutf8data.c
> +++ b/fs/unicode/mkutf8data.c
> @@ -3194,7 +3194,7 @@ static int normalize_line(struct tree *tree)
> /* Second test: length-limited string. */
> s = buf2;
> /* Replace NUL with a value that will cause an error if seen. */
> - s[strlen(s) + 1] = -1;
> + s[strlen(s)] = -1;
> t = buf3;
> if (utf8cursor(&u8c, tree, s))
> return -1;

--
Gabriel Krisman Bertazi