Re: [GIT PULL] pstore updates for v6.6-rc1

From: Eric Biggers
Date: Wed Aug 30 2023 - 02:06:09 EST

Next message: Goud, Srinivas: "RE: [PATCH v3 1/3] dt-bindings: can: xilinx_can: Add ECC property 'xlnx,has-ecc'"
Previous message: Google: "Re: [PATCH v3] scripts/link-vmlinux.sh: Add alias to duplicate symbols for kallsyms"
In reply to: Ard Biesheuvel: "Re: [GIT PULL] pstore updates for v6.6-rc1"
Next in thread: Ard Biesheuvel: "Re: [GIT PULL] pstore updates for v6.6-rc1"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

On Tue, Aug 29, 2023 at 11:43:37PM +0200, Ard Biesheuvel wrote:
> On Tue, 29 Aug 2023 at 20:04, Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Tue, 29 Aug 2023 at 10:29, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
> > >
> > > This is an oversight on my part. The diff below plugs this into the pstore code
> >
> > Hmm. My reaction is that you should also add the comment about the
> > "Work around a bug in zlib" issue, because this code makes no sense
> > otherwise.
> >
>
> Naturally. But pasting the comment into the email body seemed unnecessary.
>
> > Of course, potentially even better would be to actually move this fix
> > into our copy of zlib. Does the bug / misfeature still exist in
> > upstream zlib?
> >
>
> I have no idea. You are the one sitting on the only [potential]
> reproducer I am aware of, and there is nothing in the git (or prior)
> history that sheds any light on this. Could you copy one of those EFI
> variables to a file and share it so I can try and reproduce this?
>
> > Also, grepping around a bit, I note that btrfs, for example, just does
> > that Z_SYNC_FLUSH, and then checks for Z_OK. None of this Z_STREAM_END
> > stuff.
> >
> > Similarly, going off to the debian code search, I find other code that
> > just accepts either Z_OK or Z_STREAM_END.
> >
> > So what's so magical about how pstore uses zlib? This is just very odd.
> >
>
> AIUI, zlib can be used in raw mode and with a header/footer. Passing
> the wbits parameter as a negative number (!) will switch into raw
> mode.
>
> The btrfs and jffs2 occurrences both default to positive wbits, and
> switch to negative in a very specific case that involves zlib
> internals that I don't understand. crypto/deflate.c implements both
> modes, and pstore always used the raw one in all cases.
>
> The workaround in crypto/deflate.c is documented as being specific to
> the raw mode, which is why it makes sense to at least verify whether
> the same workaround that pstore-deflate was using when doing raw zlib
> through the crypto API is still needed now that it calls zlib
> directly.

I get the "pstore: zlib_inflate() failed, ret = -5!" messages on my system too,
so I looked into it. What's happening is that the pstore records are coming
from the efi_pstore backend, which has pstore_info::bufsize of 1024 bytes, but
they decompress to more than 1024 bytes. Since pstore now sizes the buffer for
decompressed data to only pstore_info::bufsize, lib/zlib_inflate correctly
returns Z_BUF_ERROR. (BTW, I write "lib/zlib_inflate", not "zlib", since it was
forked from the real zlib 25 years ago. Regardless, the problem isn't there.)

I think we partially misinterpreted the functions like zbufsize_deflate() that
Ard's patches removed. They seemed to be used only for getting the worst case
compressed size for an uncompressed size of pstore_info::bufsize. Actually,
they were used both for that, *and* for getting the uncompressed size to try to
compress into pstore_info::bufsize. (Which is very weird, as these are two very
different concepts.)

So I think first we need to decide whether pstore should try to use compression
to fit more than pstore_info::bufsize of data in each pstore record, as opposed
to just shrinking the size of the record actually written. If yes, then this
really needs some more thought than the previous code which seemed to do this by
accident. If no, then the new approach is fine.

BTW, what happens if someone was using a non-DEFLATE algorithm and then upgrades
their kernel? Decompression can fail for that too. So maybe pstore just needs
to consider that decompression of pstore records written by an older kernel can
fail -- either due to the algorithm changing or due to the uncompressed size
being too large for the new code -- and silence the error messages accordingly?
How "persistent" do these persistent store records really have to be?

- Eric

Next message: Goud, Srinivas: "RE: [PATCH v3 1/3] dt-bindings: can: xilinx_can: Add ECC property 'xlnx,has-ecc'"
Previous message: Google: "Re: [PATCH v3] scripts/link-vmlinux.sh: Add alias to duplicate symbols for kallsyms"
In reply to: Ard Biesheuvel: "Re: [GIT PULL] pstore updates for v6.6-rc1"
Next in thread: Ard Biesheuvel: "Re: [GIT PULL] pstore updates for v6.6-rc1"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]