Re: RFC: printk: kmsg_dump_get_line_nolock() buffer overflow

From: Petr Mladek
Date: Thu Jan 14 2021 - 10:21:30 EST


On Wed 2021-01-13 21:06:28, John Ogness wrote:
> Hello,
>
> I have discovered that kmsg_dump_get_line_nolock() is not allowed to
> fill the full buffer that it is provided. It should leave at least 1
> byte free so that callers can append a terminator.
>
> Example from arch/powerpc/xmon/xmon.c:
>
> kmsg_dump_get_line_nolock(&dumper, false, buf, sizeof(buf), &len);
> buf[len] = '\0';
>
> This unusual behavior was not noticed and with commit 896fbe20b4e2
> ("printk: use the lockless ringbuffer") the implementation of
> kmsg_dump_get_line_nolock() was changed so that the full buffer can be
> filled. This means that the two kmsg_dump_get_line*() users currently
> can have a 1-byte buffer overflow.

Just to make it clear. There is no visible change in
kmsg_dump_get_line_nolock() in the commit 896fbe20b4e2
("printk: use the lockless ringbuffer").

The eal change happened in record_printk_text():

- if (buf) {
- if (prefix_len + text_len + 1 >= size - len)
+ /*
+ * Truncate the text if there is not enough space to add the
+ * prefix and a trailing newline.
+ */
+ if (len + prefix_len + text_len + 1 > buf_size) {
+ /* Drop even the current line if no space. */
+ if (len + prefix_len + line_len + 1 > buf_size)
break;

It replaced ">=" with ">".

It is pitty that I have missed this. I remember that I discussed
exactly this problem before, see
https://lore.kernel.org/lkml/20190710080402.ab3f4qfnvez6dhtc@xxxxxxxx/

And I did exactly the same mistake. I have missed the two users in
"arch/powerpc" and "arch/um".

> This unusual kmsg_dump_get_line_nolock() behavior seems to have been
> accidentally introduced with commit 3ce9a7c0ac28 ("printk() - restore
> prefix/timestamp printing for multi-newline strings"). Indeed, the
> whitespace on the line that causes this behavior is not conform, leading
> me to think it was a last-minute change or a typo. (The behavior is
> caused by the ">=" instead of an expected ">".)
>
> + if (print_prefix(msg, syslog, NULL) +
> + text_len + 1>= size - len)
> + break;
>
> Perhaps there was some paranoia involved because this same commit also
> fixes a buffer overflow in the old implementation:
>
> - if (len + msg->text_len > size)
> - return -EINVAL;
> - memcpy(text + len, log_text(msg), msg->text_len);
> - len += msg->text_len;
> - text[len++] = '\n';

It is clear that this problem happens repeatedly.

Now, the change in record_printk_text() behavior affects also other
callers. For example, syslog_print() fills the buffer completely
as well now. I could imagine a userspace code that does the same
mistake and it works just by chance.

We should restore the original record_printk_text() behavior
and add the comment explaining why it is done this way. And
I would even explicitly add the trailing '\0' as suggested at
https://lore.kernel.org/lkml/20190710121049.rwhk7fknfzn3cfkz@xxxxxxxxxxxxxxx/#t

Best Regards,
Petr