Re: pstore/ram: printk: NULL characters in pstore ramoops area

From: Petr Mladek
Date: Tue Aug 08 2023 - 14:36:48 EST


On Mon 2023-08-07 10:19:07, Vijay Balakrishna wrote:
> I'm including my earlier email as it didn't deliver to
> linux-kernel@xxxxxxxxxxxxxxx due to HTML subpart.  Also sharing new findings
> --
>
> Limiting the size of buffer exposed to record_print_text() and
> prb_for_each_record() in kmsg_dump_get_buffer() also resolves this issue [5]
> -- no NULL characters in pstore/ramoops memory.  The advantage is no memory
> allocation (as done in previously shared changes [4]) which could be
> problematic during kernel shutdown/reboot or during kexec reboot.
>
> [5]
>
> Author: Vijay Balakrishna <vijayb@xxxxxxxxxxxxxxxxxxx>
> Date:   Sat Aug 5 18:47:27 2023 +0000
>
>     printk: limit the size of buffer exposed to record_print_text() by
> kmsg_dump_get_buffer()
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index b82e4c3b52f4..8feec932aa35 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3453,9 +3453,9 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper,
> bool syslog,
>          */
>         next_seq = seq;
>
> -       prb_rec_init_rd(&r, &info, buf, size);
>
>         len = 0;
> +       prb_rec_init_rd(&r, &info, buf + len, (size - len) >= LOG_LINE_MAX +
> PREFIX_MAX ? LOG_LINE_MAX + PREFIX_MAX : size - len);
>         prb_for_each_record(seq, prb, seq, &r) {
>                 if (r.info->seq >= dumper->next_seq)
>                         break;
> @@ -3463,7 +3463,7 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper,
> bool syslog,
>                 len += record_print_text(&r, syslog, time);
>
>                 /* Adjust record to store to remaining buffer space. */
> -               prb_rec_init_rd(&r, &info, buf + len, size - len);
> +               prb_rec_init_rd(&r, &info, buf + len, (size - len) >=
> LOG_LINE_MAX + PREFIX_MAX ? LOG_LINE_MAX + PREFIX_MAX : size - len);
>         }
>
>         dumper->next_seq = next_seq;


I looks like some problems with counting data that fit into the
buffer. I see that several fixes were added after 5.10 release.
I wonder if they help to solve this:

commit 89ccf18f032f26946 ("printk: fix kmsg_dump_get_buffer length calulations")
commit f0e386ee0c0b71ea6 ("printk: fix buffer overflow potential for print_text()")
commit 08d60e59995401105 ("printk: fix string termination for record_print_text()")

All 3 commits were backported into 5.10 stable.

The 2nd commit without the 3rd one might cause writing an extra "\0"
into a wrong place.


> On 8/3/23 16:34, Vijay Balakrishna wrote:
> >
> > Hello,
> >
> > We are noticing NULL characters in ramoops/pstore memory after a warm or
> > a kexec reboot [1] in our 5.10 ARM64 product kernel after moving from
> > 5.4 kernel.  I ruled out fs/pstore/* as the source from where NULL
> > characters originate by adding debug code [2] and confirming from
> > collected output [3].  Then isolated further to printk log/ring buffer
> > area, the NULL characters were already present in buffer in
> > kmsg_dump_get_buffer() when kmsg log lines are read.  After looking at
> > printk merges in mainline kernel, I cherry-picked following which looked
> > related to our 5.10 kernel and still see NULL characters.
> >
> > 4260e0e5510158d704898603331e5365ebe957de printk: consolidate
> > kmsg_dump_get_buffer/syslog_print_all code
> > 726b5097701a8d46f5354be780e1a11fc4ca1187 printk: refactor
> > kmsg_dump_get_buffer()
> > bb07b16c44b2c6ddbafa44bb06454719002e828e printk: limit second loop
> > of syslog_print_all

These commits tried to reduce a code duplication between kmsg_dump
and syslog API.

> > Looking at syslog_print_all() I notice it uses a local buffer unlike
> > kmsg_dump_get_buffer() which manipulates buffer in-place to add syslog
> > and timestamp prefix data.

syslog_print_all() gets a buffer from userspace. It can be written
only by copy_to_user(). It allocates an extra buffer so that it could
do all the message formatting in the kernel space.

> I made changes [4] to kmsg_dump_get_buffer()
> > to use a local buffer similar to syslog_print_all() after which I don't
> > see NULL characters in ramoops area.  I couldn't spot any suspicious
> > code in record_print_text() where prefix data added in-place.  I'm
> > reaching out to both pstore/ram and printk folks for comments. I can
> > investigate/debug further with assistance and input from you.

It is more safe with the extra buffer. It is always used only
for one message. It is possible that the NULL character was
also written in a wrong place there. But it did not affect
the buffer passed to kmsg_dump_get_buffer().

I hope that the above three commit fixing the length calculation
and potential buffer overflow will fix this.

Anyway, thanks a lot for debugging this and providing all the details.

Best Regards,
Petr