Re: nios2 crash/hang in mainline due to 'lib: update LZ4 compressor module'

From: Guenter Roeck
Date: Mon Feb 27 2017 - 15:39:02 EST


On Mon, Feb 27, 2017 at 08:34:55PM +0100, Sven Schmidt wrote:
> Hi Guenter,
>
> thanks for your testing!
>
> I must admit, I'm fairly new to kernel development and a little overwhelmed by all that tools used.
> So I do not really know how to reproduce your test using your script. I installed qemu from the master branch and buildroot.
> Unfortunately, that's the point I'm stuck. I would be grateful if you provide me some lead how to continue.
> Would I make the kernel using ARCH=nios2 and a defconf and pass it to qemu? What arguments do I provide to that script
> (especially, the machine param)?
>

run-qemu-nios2.sh doesn't need any parameters, though you would have to update
PATH_NIOS2 to match your toolchain and QEMU to match the qemu binary location.
Otherwise just run the script from your linux repository.

You can also build a nios2 image using 10m50_defconfig and run qemu directly.
Just remember to enable CONFIG_NIOS2_PASS_CMDLINE=y and CONFIG_BLK_DEV_INITRD=y.
CONFIG_BLK_DEV_INITRD=y enables CONFIG_RD_LZ4 which triggers the problem.

path-to-qemu/qemu-system-nios2 -M 10m50-ghrd -kernel vmlinux -no-reboot \
-dtb arch/nios2/boot/dts/10m50_devboard.dtb \
--append "rdinit=/sbin/init" \
-initrd busybox-nios2.cpio \
-nographic -monitor none

should do it (assuming you copied the root file system from
https://github.com/groeck/linux-build-test/blob/master/rootfs/nios2/busybox-nios2.cpio).

> On Sun, Feb 26, 2017 at 01:03:38PM -0800, Guenter Roeck wrote:
> > Hi Sven,
> >
> > my qemu test for nios2 started failing with commit 4e1a33b105dd ("lib:
> > update LZ4 compressor module"). The test hangs early during boot before
> > any console output is seen. Reverting the offending patch as well as the
> > subsequent lz4 related patches fixes the problem. Disabling CONFIG_RD_LZ4
> > and with it other LZ4 options also fixes it (as does adding "return -EINVAL;"
> > at the top of the LZ4 decompression code). For reference, bisect log
> > is attached.
> >
>
> So, seems like it's the decompressor? Which decompression code do you mean exactly? LZ4_decompress_fast/_safe/_generic?
> Since the decompression functions worked fine in all previous tests, and this is a problem during boot, my first guess would
> be the lib/decompress_unlz4.c, providing the functions for decompressing a lz4-compressed kernel image.
> But then it should only result in problems when the kernel image is compressed, wouldn't it?
>
I am booting the uncompressed kernel. Also, if I disable CONFIG_RD_LZ4
in the configuration, everything works just fine. Just the _presence_ of
the decompression code seems to trigger the problem. No idea if enabling
CONFIG_RD_LZ4 results in some LZ4 compressed code to be generated (I do
see usr/initramfs_data.cpio.lz4).

I added "return -EINVAL;" to the top of LZ4_decompress_generic(), which
also helped. Adding it to the individual decompression functions seemed
to be an on/off thing; sometimes it helped, sometimes not.

> > I tried with buildroot toolchains using gcc 6.1.0 as well as 6.3.0
> > and binutils 2.26.1. Scripts used to run the tests are available at
> > https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2.
> > Qemu is from qemu mainline or qemu v2.8 with nios2 patches applied.
> >
> > I tried to track down the problem, with no success. Just the presence
> > of the LZ4 code seems to be sufficient to cause the problem; I have
> > no idea why that would be the case.
> >
>
> Maybe there's someone who has an idea and/or is experiencing similar issues. Hopefully, we can track this down.
>

Agreed. For my part I am pretty much out of ideas. I could explicitly
disable CONFIG_RD_LZ4 in my tests, but that would really just defeat
the purpose.

Guenter

> > Please let me know if there is anything I can do to help tracking down
> > the problem.
> >
> > Thanks,
> > Guenter
> >
> > ---
> > # bad: [c4f3f22eddc982d247ffe2a6690c3e4a5c46dd09] Merge tag 'linux-kselftest-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
> > # good: [9e314890292c0dd357eadef6a043704fa0b4c157] Merge tag 'openrisc-for-linus' of git://github.com/openrisc/linux
> > git bisect start 'HEAD' '9e31489'
> > # bad: [7067739df23ffd641ca99c967830e0ed2ba39eab] Merge branch 'i2c/for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
> > git bisect bad 7067739df23ffd641ca99c967830e0ed2ba39eab
> > # good: [c5adae9583ef6985875532904160c6bf9f07b453] lib: add CONFIG_TEST_SORT to enable self-test of sort()
> > git bisect good c5adae9583ef6985875532904160c6bf9f07b453
> > # bad: [edccb59429657b09806146339e2b27594c1d1da0] Merge tag 'fbdev-v4.11' of git://github.com/bzolnier/linux
> > git bisect bad edccb59429657b09806146339e2b27594c1d1da0
> > # good: [72db33355c1431fefcabb06c9c25705e8226eed5] fbdev: ssd1307fb: Start to use gpiod API for reset gpio
> > git bisect good 72db33355c1431fefcabb06c9c25705e8226eed5
> > # bad: [95330473636e5e4546f94874c957c3be66bb2140] checkpatch: remove false unbalanced braces warning
> > git bisect bad 95330473636e5e4546f94874c957c3be66bb2140
> > # bad: [69c78423b8f439b077929410bdf8f88e7031b891] lib/lz4: remove back-compat wrappers
> > git bisect bad 69c78423b8f439b077929410bdf8f88e7031b891
> > # bad: [e23d54e48346e775be53b3cc25a95d65da960393] lib/decompress_unlz4: change module to work with new LZ4 module version
> > git bisect bad e23d54e48346e775be53b3cc25a95d65da960393
> > # bad: [4e1a33b105ddf201f66dcc44490c6086a25eca0b] lib: update LZ4 compressor module
> > git bisect bad 4e1a33b105ddf201f66dcc44490c6086a25eca0b
> > # good: [8893f519330bb073a49c5b4676fce4be6f1be15d] lib/test_sort.c: make it explicitly non-modular
> > git bisect good 8893f519330bb073a49c5b4676fce4be6f1be15d
> > # first bad commit: [4e1a33b105ddf201f66dcc44490c6086a25eca0b] lib: update LZ4 compressor module
>
> Thank you,
>
> Sven