Re: [PATCH] um: refactor deprecated strncpy to strtomem

From: Kees Cook
Date: Tue Aug 08 2023 - 20:42:02 EST


On Tue, Aug 08, 2023 at 10:28:57AM -0700, Justin Stitt wrote:
> On Mon, Aug 7, 2023 at 4:40 PM Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> >
> > On Mon, Aug 07, 2023 at 03:36:55PM -0700, Bill Wendling wrote:
> > > On Mon, Aug 7, 2023 at 2:18 PM Justin Stitt <justinstitt@xxxxxxxxxx> wrote:
> > > >
> > > > Use `strtomem` here since `console_buf` is not expected to be
> > > > NUL-terminated. We should probably also just use `MCONSOLE_MAX_DATA`
> >
> > How is it known that console_buf is not a C-string?
> There are a few clues that led me to believe console_buf was not a C-string:
> 1) `n = min((size_t) len, ARRAY_SIZE(console_buf));` means that `n`
> can be equal to size of buffer which is then subsequently filled
> entirely by the `strncpy` invocation.
> 2) console_buf looks to be a circular buffer wherein once it's filled
> it starts again from the beginning.
> 3) ARRAY_SIZE is used on it as opposed to strlen or something like
> that (but not sure if ARRAY_SIZE is also used on C-strings to be fair)
> 4) It has `buf` in its name which I loosely associate with non
> C-strings char buffers.
>
> All in all, it looks to be a non C-string but honestly it's hard to
> tell, especially since if it IS a C-string the previous implementation
> had some bugs with strncpy I believe.

I took a look at this. It's only used in that one function, and it's
always working from the start, and uses at max ARRAY_SIZE(console_buf),
as you mentioned. Then it's passed to mconsole_reply_len() with "len",
so we can examine that:

do {
...
len = MIN(total, MCONSOLE_MAX_DATA - 1);
...
memcpy(reply.data, str, len);
reply.data[len] = '\0';
total -= len;
...
} while (total > 0);

It's sending it as NUL-terminated, possibly breaking it up. If this used
the whole MCONSOLE_MAX_DATA, it would send MCONSOLE_MAX_DATA - 1 bytes
followed by NUL, and then 1 byte, followed by NUL. :P

Anyway, point being, yes, it appears that console_buf is a nonstring.
But it's a weird one...

In your v2 patch, you use strtomem(), which is probably close, but in
looking at the implementation details here, I'm starting to think that
strtomem() needs to return the number of bytes copied. Initially I was
thinking it could actually just be replaced with memcpy(), but there
are some side-effects going on in the original code.

First:

static void console_write(..., const char *string, unsigned int len)
...
n = min((size_t) len, ARRAY_SIZE(console_buf));
strncpy(console_buf, string, n);

The 'n' is being passed in, so it's possible that "string" has NUL-bytes
in it. (Though I would assume, unlikely -- I haven't tracked down the
callers of console_write() here...)

That means that strncpy() will stop the copy at the first NUL and
then NUL pad the rest to 'n', and that's what gets passed down to
mconsole_reply_len() (which is specifically using memcpy and will carry
those NUL bytes forward). So we should not lose the NUL-padding behavior.

Additionally, when "len" is smaller than MCONSOLE_MAX_DATA, the padding
will only go up to "len", not MCONSOLE_MAX_DATA. So there's a weird
behavioral difference here where the new code is doing more work, but,
okay fine.

I find this original code to be rather confused about whether it is
dealing with C-strings or byte arrays. :|

So now I wanted to find the callers of struct console::write. My
Coccinelle script:

@found@
struct console *CON;
@@

* CON->write(...)

show hits in kernel/printk/printk.c, which is dealing with a
NUL-terminated string constructed by vsnprintf():

console_emit_next_record

Technically there could be NUL bytes in such a string, but almost
everything else expects to be dealing with C-strings here. This looks
really fragile.

So, I'm back around to thinking this should just be memcpy():

- strncpy(console_buf, string, n);
+ memcpy(console_buf, string, n);


--
Kees Cook