Re: [PATCH 0/1] Possible bug in zram on ppc64le on vfat

From: Martin Doucha
Date: Thu Nov 10 2022 - 09:30:08 EST


On 07. 11. 22 22:25, Minchan Kim wrote:
On Mon, Nov 07, 2022 at 08:11:35PM +0100, Petr Vorel wrote:
Hi all,

following bug is trying to workaround an error on ppc64le, where
zram01.sh LTP test (there is also kernel selftest
tools/testing/selftests/zram/zram01.sh, but LTP test got further
updates) has often mem_used_total 0 although zram is already filled.

Hi, Petr,

Is it happening on only ppc64le?

Is it a new regression? What kernel version did you use?

Hi,
I've reported the same issue on kernels 4.12.14 and 5.3.18 internally to our kernel developers at SUSE. The bugreport is not public but I'll copy the bug description here:

New version of LTP test zram01 found a sysfile issue with zram devices mounted using VFAT filesystem. When when all available space is filled, e.g. by `dd if=/dev/zero of=/mnt/zram0/file`, the corresponding sysfile /sys/block/zram0/mm_stat will report that the compressed data size on the device is 0 and total memory usage is also 0. LTP test zram01 uses these values to calculate compression ratio, which results in division by zero.

The issue is specific to PPC64LE architecture and the VFAT filesystem. No other tested filesystem has this issue and I could not reproduce it on other archs (s390 not tested). The issue appears randomly about every 3 test runs on SLE-15SP2 and 15SP3 (kernel 5.3). It appears less frequently on SLE-12SP5 (kernel 4.12). Other SLE version were not tested with the new test version yet. The previous version of the test did not check the VFAT filesystem on zram devices.

I've tried to debug the issue and collected some interesting data (all values come from zram device with 25M size limit and zstd compression algorithm):
- mm_stat values are correct after mkfs.vfat:
65536 220 65536 26214400 65536 0 0 0

- mm_stat values stay correct after mount:
65536 220 65536 26214400 65536 0 0 0

- the bug is triggered by filling the filesystem to capacity (using dd):
4194304 0 0 26214400 327680 64 0 0

- adding `sleep 1` between `dd` and reading mm_stat does not help
- adding sync between `dd` and reading mm_stat appears to fix the issue:
26214400 2404 262144 26214400 327680 399 0 0

--
Martin Doucha mdoucha@xxxxxxx
QA Engineer for Software Maintenance
SUSE LINUX, s.r.o.
CORSO IIa
Krizikova 148/34
186 00 Prague 8
Czech Republic