Re: [PATCH v5 0/3]: lib/lzo: run-length encoding support

From: Tao Liu
Date: Tue Mar 12 2024 - 04:29:07 EST


On Fri, Mar 8, 2024 at 8:32 PM Dave Rodgman <dave.rodgman@xxxxxxx> wrote:
>
> Hi Tao,
>
>
> I don’t see any reason for the upstream LZO library not to pick up the lzo-rle algorithm from the kernel, and I would expect the same performance benefit in userspace. This is really a question for Markus (the owner/maintainer of that library).
>
Hi Markus,

Is it possible to port the lzo-rle algorithm to the lzo library, so
userspace programs such as crash-utility or drgn can use it to
decompress the kernel data? Thanks in advance!

>
> I think the simplest short-term option would be to pull in the lzo library as source into crash-utility, and carry a patch against it to add support for lzo-rle.

Hi Dave,

Thanks for the suggestion! I agree with your short-term option, this
is what we are planning to do for now. If lzo-rle has been integrated
into the lzo library, we can then delete the patch from crash-utility
code.

Thanks,
Tao Liu

>
>
> Dave
>
>
> From: Tao Liu <ltao@xxxxxxxxxx>
> Date: Friday, 8 March 2024 at 03:26
> To: Dave Rodgman <dave.rodgman@xxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx <linux-kernel@xxxxxxxxxxxxxxx>, Matt Sealey <Matt.Sealey@xxxxxxx>, davem@xxxxxxxxxxxxx <davem@xxxxxxxxxxxxx>, gregkh@xxxxxxxxxxxxxxxxxxx <gregkh@xxxxxxxxxxxxxxxxxxx>, herbert@xxxxxxxxxxxxxxxxxxx <herbert@xxxxxxxxxxxxxxxxxxx>, markus@xxxxxxxxxxxxx <markus@xxxxxxxxxxxxx>, minchan@xxxxxxxxxx <minchan@xxxxxxxxxx>, nitingupta910@xxxxxxxxx <nitingupta910@xxxxxxxxx>, rpurdie@xxxxxxxxxxxxxx <rpurdie@xxxxxxxxxxxxxx>, sergey.senozhatsky.work@xxxxxxxxx <sergey.senozhatsky.work@xxxxxxxxx>, sonnyrao@xxxxxxxxxx <sonnyrao@xxxxxxxxxx>, akpm@xxxxxxxxxxxxxxxxxxxx <akpm@xxxxxxxxxxxxxxxxxxxx>, sfr@xxxxxxxxxxxxxxxx <sfr@xxxxxxxxxxxxxxxx>, nd <nd@xxxxxxx>
> Subject: Re: [PATCH v5 0/3]: lib/lzo: run-length encoding support
>
> Hi Dave,
>
> On Tue, Feb 05, 2019 at 03:59:59PM +0000, Dave Rodgman wrote:
> > Hi,
> >
> > Following on from the previous lzo-rle patchset:
> >
> > https://lkml.org/lkml/2018/11/30/972
> >
> > This patchset contains only the RLE patches, and should be applied on top of
> > the non-RLE patches ( https://lkml.org/lkml/2019/2/5/366 ).
> >
>
> Sorry for the interruption, since it is an old patchset and discussion.
> I have a few questions on lzo-rle support, hope you can give me some
> directions, thanks in advance!
>
> 1) Is lzo-rle suitable for userspace library? I've checked the current
> userspace lzo library lzo-2.10, it seems no lzo-rle support (Please
> correct me if I'm wrong). If lzo-rle have better performance in kernel,
> then is it possible to implement one in userspace and gain better
> performance as well?
>
> 2) Currently Yulong TANG have encountered problem that, crash utility
> cannot decompress a lzo-rle compressed zram since kernel 5.1 [1], since
> there is no lzo-rle support for current lzo library, crash have to
> import the kernel source code directly into crash, which is not good for
> crash utility code maintainance. It will be better if we can update lzo
> library with lzo-rle support. I guess not only crash, but also other
> kernel debugging tools running in userspace such as drgn may also need
> this feature.
>
> Do you have any suggestions on for these?
>
> [1]: https://www.mail-archive.com/devel@xxxxxxxxxxxxxxxxxxxxxxxxxxx/msg00475.html
>
>
> Thanks,
> Tao Liu
>
>
> >
> > Previously, some questions were raised around the RLE patches. I've done some
> > additional benchmarking to answer these questions. In short:
> >
> > - RLE offers significant additional performance (data-dependent)
> > - I didn't measure any regressions that were clearly outside the noise
> >
> >
> > One concern with this patchset was around performance - specifically, measuring
> > RLE impact separately from Matt Sealey's patches (CTZ & fast copy). I have done
> > some additional benchmarking which I hope clarifies the benefits of each part
> > of the patchset.
> >
> > Firstly, I've captured some memory via /dev/fmem from a Chromebook with many
> > tabs open which is starting to swap, and then split this into 4178 4k pages.
> > I've excluded the all-zero pages (as zram does), and also the no-zero pages
> > (which won't tell us anything about RLE performance). This should give a
> > realistic test dataset for zram. What I found was that the data is VERY
> > bimodal: 44% of pages in this dataset contain 5% or fewer zeros, and 44%
> > contain over 90% zeros (30% if you include the no-zero pages). This supports
> > the idea of special-casing zeros in zram.
> >
> > Next, I've benchmarked four variants of lzo on these pages (on 64-bit Arm at
> > max frequency): baseline LZO; baseline + Matt Sealey's patches (aka MS);
> > baseline + RLE only; baseline + MS + RLE. Numbers are for weighted roundtrip
> > throughput (the weighting reflects that zram does more compression than
> > decompression).
> >
> > https://drive.google.com/file/d/1VLtLjRVxgUNuWFOxaGPwJYhl_hMQXpHe/view?usp=sharing
> >
> > Matt's patches help in all cases for Arm (and no effect on Intel), as expected.
> >
> > RLE also behaves as expected: with few zeros present, it makes no difference;
> > above ~75%, it gives a good improvement (50 - 300 MB/s on top of the benefit
> > from Matt's patches).
> >
> > Best performance is seen with both MS and RLE patches.
> >
> > Finally, I have benchmarked the same dataset on an x86-64 device. Here, the
> > MS patches make no difference (as expected); RLE helps, similarly as on Arm.
> > There were no definite regressions; allowing for observational error, 01%
> > (3/4178) of cases had a regression > 1 standard deviation, of which the largest
> > was 4.6% (1.2 standard deviations). I think this is probably within the noise.
> >
> > https://drive.google.com/file/d/1xCUVwmiGD0heEMx5gcVEmLBI4eLaageV/view?usp=sharing
> >
> > One point to note is that the graphs show RLE appears to help very slightly
> > with no zeros present! This is because the extra code causes the clang
> > optimiser to change code layout in a way that happens to have a significant
> > benefit. Taking baseline LZO and adding a do-nothing line like
> > "__builtin_prefetch(out_len);" immediately before the "goto next" has the same
> > effect. So this is a real, but basically spurious effect - it's small enough
> > not to upset the overall findings.
> >
> > Dave
> >
> >