[PATCH 0/6] lib/lzo: performance improvements

From: Dave Rodgman
Date: Wed Nov 21 2018 - 07:06:12 EST


This patch series introduces performance improvements for lzo.

The improvements fall into two categories: general Arm-specific optimisations
(e.g., more efficient memory access); and the introduction of a special case 
for handling runs of zeros (which is a common case for zram) using run-length
encoding.

The introduction of RLE modifies the bitstream such that it can't be decoded
by old versions of lzo (the new lzo-rle can correctly decode old bitstreams).
To avoid possible issues where data is persisted on disk (e.g., squashfs), the 
final patch in this series separates lzo-rle into a separate algorithm
alongside lzo, so that the new lzo-rle is (by default) only used for zram and
must be explicitly selected for other use-cases. This final patch could be
omitted if the consensus is that we'd rather avoid proliferation of lzo
variants.

Overall, performance is improved by around 1.1 - 4.8x (data-dependent: data
with many zero runs shows higher improvement). Under real-world testing with
zram, time spent in (de)compression during swapping is reduced by around 27%.
The graph below shows the weighted round-trip throughput of lzo, lz4 and
lzo-rle, for randomly generated 4k chunks of data with varying levels of
entropy. (To calculate weighted round-trip throughput, compression performance
is emphasised to reflect the fact that zram does around 2.25x more compression
than decompression. (Results and overall trends are fairly similar for
unweighted).

https://drive.google.com/file/d/18GU4pgRVCLNN7wXxynz-8R2ygrY2IdyE/view

Contributors:
Dave Rodgman <dave.rodgman@xxxxxxx>
Matt Sealey <matt.sealey@xxxxxxx>