Re: pstore/ram: printk: NULL characters in pstore ramoops area

From: Vijay Balakrishna
Date: Tue Aug 08 2023 - 21:21:52 EST


Thanks for your reply Petr.

See inline.

On 8/8/23 01:15, Petr Mladek wrote:
On Mon 2023-08-07 10:19:07, Vijay Balakrishna wrote:
I'm including my earlier email as it didn't deliver to
linux-kernel@xxxxxxxxxxxxxxx due to HTML subpart.  Also sharing new findings
--

Limiting the size of buffer exposed to record_print_text() and
prb_for_each_record() in kmsg_dump_get_buffer() also resolves this issue [5]
-- no NULL characters in pstore/ramoops memory.  The advantage is no memory
allocation (as done in previously shared changes [4]) which could be
problematic during kernel shutdown/reboot or during kexec reboot.

[5]

Author: Vijay Balakrishna <vijayb@xxxxxxxxxxxxxxxxxxx>
Date:   Sat Aug 5 18:47:27 2023 +0000

    printk: limit the size of buffer exposed to record_print_text() by
kmsg_dump_get_buffer()

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b82e4c3b52f4..8feec932aa35 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3453,9 +3453,9 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper,
bool syslog,
         */
        next_seq = seq;

-       prb_rec_init_rd(&r, &info, buf, size);

        len = 0;
+       prb_rec_init_rd(&r, &info, buf + len, (size - len) >= LOG_LINE_MAX +
PREFIX_MAX ? LOG_LINE_MAX + PREFIX_MAX : size - len);
        prb_for_each_record(seq, prb, seq, &r) {
                if (r.info->seq >= dumper->next_seq)
                        break;
@@ -3463,7 +3463,7 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper,
bool syslog,
                len += record_print_text(&r, syslog, time);

                /* Adjust record to store to remaining buffer space. */
-               prb_rec_init_rd(&r, &info, buf + len, size - len);
+               prb_rec_init_rd(&r, &info, buf + len, (size - len) >=
LOG_LINE_MAX + PREFIX_MAX ? LOG_LINE_MAX + PREFIX_MAX : size - len);
        }

        dumper->next_seq = next_seq;

Any comments on above change to limit buffer size/range exposed?

The buffer passed to kmsg_dump_get_buffer() is kzalloc()'ed in fs/pstore/ram.c: ramoops_probe()

                cxt->pstore.buf = kzalloc(cxt->pstore.bufsize, GFP_KERNEL);

that may explain NULL characters in buffer.

Output seen after adding NULL character check before calling record_print_text() suggest the issue may not be in record_print_text().

(diffs below are from v5.10.178)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 17a310dcb6d9..4cbb845424f5 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3431,10 +3431,17 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
        /* actually read text into the buffer now */
        l = 0;
        while (prb_read_valid(prb, seq, &r)) {
+               size_t pr_len;
                if (r.info->seq >= dumper->next_seq)
                        break;

-               l += record_print_text(&r, syslog, time);
+               if (null_char(r.text_buf, r.info->text_len))
+                       pr_crit("%s: pre record_print_text() a NULL char in record text_buf, l %zu, textlen %zu\n", __func__, l, r.info->text_len);
+               pr_len = record_print_text(&r, syslog, time);
+               l += pr_len;
+
+               if (null_char(r.text_buf, pr_len))
+                       pr_crit("%s: post record_print_text() a NULL char in record text_buf, l %zu, pr_len %zu\n", __func__, l, pr_len);

                /* adjust record to store to remaining buffer space */
                prb_rec_init_rd(&r, &info, buf + l, size - l);

(output during warm reboot)

[   12.400817] ramoops: ramoops_pstore_write: A NULL char in record buf, size 70648
[   12.408217] printk: kmsg_dump_get_buffer: pre record_print_text() a NULL char in record text_buf, l 65505, textlen 53
[   12.408219] printk: kmsg_dump_get_buffer: post record_print_text() a NULL char in record text_buf, l 65578, pr_len 73

As I replied to Kees last week, we will try to repro with modern/mainline kernel hopefully soon.


I looks like some problems with counting data that fit into the
buffer. I see that several fixes were added after 5.10 release.
I wonder if they help to solve this:

commit 89ccf18f032f26946 ("printk: fix kmsg_dump_get_buffer length calulations")
commit f0e386ee0c0b71ea6 ("printk: fix buffer overflow potential for print_text()")
commit 08d60e59995401105 ("printk: fix string termination for record_print_text()")

Above commit changes are already in v5.10.187 kernel where we are seeing the NULL characters in ramoops.


All 3 commits were backported into 5.10 stable.

The 2nd commit without the 3rd one might cause writing an extra "\0"
into a wrong place.


On 8/3/23 16:34, Vijay Balakrishna wrote:
Hello,

We are noticing NULL characters in ramoops/pstore memory after a warm or
a kexec reboot [1] in our 5.10 ARM64 product kernel after moving from
5.4 kernel.  I ruled out fs/pstore/* as the source from where NULL
characters originate by adding debug code [2] and confirming from
collected output [3].  Then isolated further to printk log/ring buffer
area, the NULL characters were already present in buffer in
kmsg_dump_get_buffer() when kmsg log lines are read.  After looking at
printk merges in mainline kernel, I cherry-picked following which looked
related to our 5.10 kernel and still see NULL characters.

4260e0e5510158d704898603331e5365ebe957de printk: consolidate
kmsg_dump_get_buffer/syslog_print_all code
726b5097701a8d46f5354be780e1a11fc4ca1187 printk: refactor
kmsg_dump_get_buffer()
bb07b16c44b2c6ddbafa44bb06454719002e828e printk: limit second loop
of syslog_print_all
These commits tried to reduce a code duplication between kmsg_dump
and syslog API.

Looking at syslog_print_all() I notice it uses a local buffer unlike
kmsg_dump_get_buffer() which manipulates buffer in-place to add syslog
and timestamp prefix data.
syslog_print_all() gets a buffer from userspace. It can be written
only by copy_to_user(). It allocates an extra buffer so that it could
do all the message formatting in the kernel space.

I made changes [4] to kmsg_dump_get_buffer()
to use a local buffer similar to syslog_print_all() after which I don't
see NULL characters in ramoops area.  I couldn't spot any suspicious
code in record_print_text() where prefix data added in-place.  I'm
reaching out to both pstore/ram and printk folks for comments. I can
investigate/debug further with assistance and input from you.
It is more safe with the extra buffer. It is always used only
for one message. It is possible that the NULL character was
also written in a wrong place there. But it did not affect
the buffer passed to kmsg_dump_get_buffer().

I hope that the above three commit fixing the length calculation
and potential buffer overflow will fix this.

Anyway, thanks a lot for debugging this and providing all the details.
Thanks,
VIjay

Best Regards,
Petr