Re: Linux 6.8-rc2

From: Guenter Roeck
Date: Mon Jan 29 2024 - 14:39:23 EST


On Sun, Jan 28, 2024 at 05:13:03PM -0800, Linus Torvalds wrote:
> So we had a number of small annoying issues in rc1, including an
> amdgpu scheduling bug that could cause a hung desktop (that would
> *eventually* recover, but after a long enough timeout that most people
> probably ended up rebooting instead. That one seems to have hit a fair
> number of people.
>
> There was also a btrfs bug wrt zstd-compressed inline extents,
> although (somewhat) happily that wasn't in rc1 and got noticed and
> reverted fairly quickly, so hopefully it didn't hit very many people.
> It did me.
>
> Anyway, I hope that with rc2, we're now in the more stable part of the
> release cycle, with those kinds of problems that might affect a lot of
> testers sorted out. So hopefully the fixes will be more subtle and not
> affect common core setups.
>
> So go out and test. It's safe now. You trust me, right?
>

Build results:
total: 155 pass: 155 fail: 0
Qemu test results:
total: 549 pass: 548 fail: 1
Failed tests:
arm:mps2-an385:mps2_defconfig:mps2-an385:initrd

Caveats:
- I disabled CONFIG_WERROR for alpha, openrisc, sh, and sparc64 builds.
This is because commit 0fcb70851fbf ("Makefile.extrawarn: turn on
missing-prototypes globally") causes test builds on those architectures
to fail if CONFIG_WERROR is enabled, and I really don't want to act as
missing-prototypes police.

- I disabled CONFIG_FRAME_WARN entirely.
The warning was just getting annoying, to a large part because people
just keep adding functions with large stack frames. On top of that,
the warning very much depends very much on the compiler and compiler
version. Finally, most of the "fixes" I have seen over the years don't
really solve the problem but just split affected functions into multiple
sub-functions, with the overall stack frame being just as large or
even larger than before. In my opinion that defeats the purpose of the
warning, making it useless.

The mps2-an385 boot failure is due to commit 6f4c45cbcb00 ("kunit: Add
tests for csum_ipv6_magic and ip_fast_csum") which is buggy. Oddly enough,
I have only seen it with my mps2-an385 (arm nommu) boot test. A fix is
available at
https://lore.kernel.org/lkml/20240124-fix_sparse_errors_checksum_tests-v4-0-bc2b8d23a35c@xxxxxxxxxxxx/

There is a new warning seen in various boot tests:

BUG: sleeping function called from invalid context at drivers/gpio/gpiolib.c:3749

This is exposed by commit 5d5dfc50e5689 ("gpiolib: remove extra_checks"),
which unconditionally enables the check. The underlying problem is that
sdhci_check_ro() disables interrupts but then (directly or indirectly)
calls mmc_gpio_get_ro() which calls gpiod_get_value_cansleep(). I am not
aware of a pending fix or how a fix should look like. Obviously, commit
5d5dfc50e5689 should not be reverted since it only exposes the problem
and did not cause it. Related discussion is at
https://lore.kernel.org/lkml/19dca2a9-36e1-4a6b-9b65-db4c0a163d56@xxxxxxxxxxxx/

On top of that, there is at least one selftest failure.

Expected handshake_req_destroy_test == req, but
handshake_req_destroy_test == 00000000
req == c3300da0
not ok 11 req_destroy works
# Handshake API tests: pass:10 fail:1 skip:0 total:11

My system is not (yet) set up to track such failues (I only happened to
notice when browsing through logs), so I don't know if this is the only
selftest failure. I do see this in v6.6.y and v6.7.y, so it is not a
new problem. I don't know (and didn't check) if anyone is aware of it.

Guenter