Re: Possible bug for ZSTD kernel decompressing

From: Feng Tang
Date: Wed Feb 02 2022 - 00:07:46 EST


Hi Nick,

On Mon, Jan 31, 2022 at 08:31:10PM +0000, Nick Terrell wrote:
>
>
> > On Jan 27, 2022, at 8:53 PM, Feng Tang <feng.tang@xxxxxxxxx> wrote:
> >
> > Hi All,
> >
> > Recently 0Day reported a 32bit i386 kernel decompression failure for my
> > patch [1], which essentially increase the kernel data section's size
> > from 19MB to 53MB, with error message:
> >
> > early console in setup code
> > early console in extract_kernel
> > input_data: 0x05077079
> > input_len: 0x00f8a633
> > output: 0x01000000
> > output_len: 0x045c4328
> > kernel_total_size: 0x05040000
> > needed_size: 0x05040000
> >
> > Decompressing Linux...
> >
> > ZSTD-compressed data is corrupt
> >
> > -- System haltedBUG: kernel hang in boot stage
> >
> > From debug, it is likely a problem of ZSTD decompression code, as when I
> > reverted my patch and hacked to increase the size of kernel data
> > section by 32MB, the same error will happen.
> >
> > Some other hints are:
> > * same i386 config with lz4 and xz algo can boot
> > * X86_64 + zstd also boots fine
> >
> > This could be reproduced by qemu cmd:
> >
> > qemu-system-i386 -machine pc -cpu host -enable-kvm -kernel bzImage -m 2048m -smp 4 -serial stdio --append "earlyprintk=ttyS0,115200 console=ttyS0,115200"
> >
> > i386 kernel config is attached, and the debug patch as below:
> > ---
> > diff --git a/init/main.c b/init/main.c
> > index 767ee2672176..873f40ddf96e 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -162,6 +162,10 @@ static size_t initargs_offs;
> > static char *execute_command;
> > static char *ramdisk_execute_command = "/init";
> >
> > +#define DT_SIZE 8192000
> > +static unsigned long tbuf[DT_SIZE] = { 1, 2, 3, 4, };
> > +
> > /*
> > * Used to generate warnings if static_key manipulation functions are used
> > * before jump_label_init is called.
> > @@ -690,6 +694,11 @@ noinline void __ref rest_init(void)
> > struct task_struct *tsk;
> > int pid;
> >
> > + unsigned long i, j;
> > + for (i = 0; i < DT_SIZE; i++)
> > + j += tbuf[i];
> > + printk("j = 0x%x\n", j);
> > +
> > rcu_scheduler_starting();
> > /*
> > * We need to spawn init first so that it obtains pid 1, however
> >
> > Please let me know if you need more info.
> >
> > [1.] https://lore.kernel.org/lkml/1627456900-42743-1-git-send-email-feng.tang@xxxxxxxxx/
>
> I've been unable to reproduce this issue using the provided patch + config based on
> Linux v5.17-rc2.
>
> What version of Linux are you testing on? Zstd was updated in v5.16, so if you're not
> testing on v5.16 or later, can you please re-test on v5.17-rc2?

The original report I got is against commit 8cd7c588decf
"mm/vmscan: throttle reclaim until some writeback completes if congested"
which is post 5.15.

I just retested and the issue can _not_ be reproduced against 5.17-rc2.
Thanks for the check and fix, and sorry for not trying latest kernel.

- Feng