Re: [PATCH 5.15 000/159] 5.15.116-rc1 review

From: Guenter Roeck
Date: Sun Jun 11 2023 - 21:12:42 EST


On 6/11/23 08:14, Guenter Roeck wrote:
On 6/10/23 14:14, Guenter Roeck wrote:
Hi,

On 6/10/23 12:23, Pavel Machek wrote:
Hi!

Build results:
    total: 155 pass: 155 fail: 0
Qemu test results:
    total: 499 pass: 498 fail: 1
Failed tests:
    arm:kudo-bmc:multi_v7_defconfig:npcm:usb0.1:nuvoton-npcm730-kudo:rootfs

The test failure is spurious and not new. I observe it randomly on
multi_v7_defconfig builds, primarily on npcm platforms. There is no error
message, just a stalled boot. I have been trying to bisect for a while,
but I have not been successful so far. No immediate concern; I just wanted
to mention it in case someone else hits the same or a similar problem.


I managed to revise my bisect script sufficiently enough to get reliable
results. It looks like the culprit is commit 503e554782c9 (" debugobject:
Ensure pool refill (again)"); see bisect log below. Bisect on four
different systems all have the same result. After reverting this patch,
I do not see the problem anymore (again, confirmed on four different
systems). If anyone has an idea how to debug this, please let me know.
I'll be happy to give it a try.

You may want to comment out debug_objects_fill_pool() in
debug_object_activate or debug_object_assert_init to see which one is
causing the failure...

CONFIG_PREEMPT_RT is disabled for you, right? (Should 5.15 even have
that option?)


CONFIG_PREEMPT_RT is disabled (it depends on ARCH_SUPPORTS_RT which is not
enabled by any architecture in v5.15.y).

The added call in debug_object_activate() triggers the problem.
Any idea what to do about it or how to debug it further ?


I did some more debugging. The call to debug_object_activate()
from debug_hrtimer_activate() causes the immediate problem, and the
call from debug_timer_activate() causes a second (less likely) problem,
where the stall is seen during reboot.

In other words, the problem is (only) seen if DEBUG_OBJECTS_TIMERS
is enabled.


Bisect log between v5.15 and v6.1 below. The fix is all but impossible to backport,
and I still have no idea what is actually going on. I think I'll just disable
DEBUG_OBJECTS_TIMERS in affected tests of v5.15.y.

Guenter

---
# fixed: [830b3c68c1fb1e9176028d02ef86f3cf76aa2476] Linux 6.1
# broken: [8bb7eca972ad531c9b149c0a51ab43a417385813] Linux 5.15
git bisect start 'v6.1' 'v5.15'
# broken: [7fa2e481ff2fee20e0338d98489eb9f513ada45f] Merge branch 'big-tcp'
git bisect broken 7fa2e481ff2fee20e0338d98489eb9f513ada45f
# fixed: [9e2e5ea3b28f81512c792f30729edb1db0c21f6a] Merge tag 'usb-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
git bisect fixed 9e2e5ea3b28f81512c792f30729edb1db0c21f6a
# fixed: [4ad680f083ec360e0991c453e18a38ed9ae500d7] Merge tag 'staging-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect fixed 4ad680f083ec360e0991c453e18a38ed9ae500d7
# fixed: [2518f226c60d8e04d18ba4295500a5b0b8ac7659] Merge tag 'drm-next-2022-05-25' of git://anongit.freedesktop.org/drm/drm
git bisect fixed 2518f226c60d8e04d18ba4295500a5b0b8ac7659
# broken: [fea3043314f30a87ca04fd1219661810600e256f] Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
git bisect broken fea3043314f30a87ca04fd1219661810600e256f
# broken: [f8122500a039abeabfff41b0ad8b6a2c94c1107d] Merge branch 'etnaviv/next' of https://git.pengutronix.de/git/lst/linux into drm-next
git bisect broken f8122500a039abeabfff41b0ad8b6a2c94c1107d
# fixed: [7e062cda7d90543ac8c7700fc7c5527d0c0f22ad] Merge tag 'net-next-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect fixed 7e062cda7d90543ac8c7700fc7c5527d0c0f22ad
# broken: [9fa87dd23251574a29cf948fd16cf39075762f3e] Merge tag 'linux-can-next-for-5.19-20220523' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
git bisect broken 9fa87dd23251574a29cf948fd16cf39075762f3e
# fixed: [88a618920e9baabc1780479e2fbb68e5551d0563] Merge tag 'docs-5.19' of git://git.lwn.net/linux
git bisect fixed 88a618920e9baabc1780479e2fbb68e5551d0563
# broken: [fdaf9a5840acaab18694a19e0eb0aa51162eeeed] Merge tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache
git bisect broken fdaf9a5840acaab18694a19e0eb0aa51162eeeed
# broken: [164f9fcb21cc9a144ca9ebcf85b00c49537f6be2] docs/ja_JP/SubmittingPatches: Suggest the use of scripts/get_maintainer.pl
git bisect broken 164f9fcb21cc9a144ca9ebcf85b00c49537f6be2
# broken: [2e17ce1106e04a7f3a83796ec623881487f75dd3] Merge tag 'slab-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
git bisect broken 2e17ce1106e04a7f3a83796ec623881487f75dd3
# fixed: [701850dc0c31bfadf75a0a74af7d2c97859945ec] printk, tracing: fix console tracepoint
git bisect fixed 701850dc0c31bfadf75a0a74af7d2c97859945ec
# broken: [1fc0ca9e0db61882208650b3603071e9f4b5cfee] printk: add con_printk() macro for console details
git bisect broken 1fc0ca9e0db61882208650b3603071e9f4b5cfee
# broken: [2bb2b7b57f81255c13f4395ea911d6bdc70c9fe2] printk: add functions to prefer direct printing
git bisect broken 2bb2b7b57f81255c13f4395ea911d6bdc70c9fe2
# fixed: [8e274732115f63c1d09136284431b3555bd5cc56] printk: extend console_lock for per-console locking
git bisect fixed 8e274732115f63c1d09136284431b3555bd5cc56
# fixed: [09c5ba0aa2fcfdadb17d045c3ee6f86d69270df7] printk: add kthread console printers
git bisect fixed 09c5ba0aa2fcfdadb17d045c3ee6f86d69270df7
# first fixed commit: [09c5ba0aa2fcfdadb17d045c3ee6f86d69270df7] printk: add kthread console printers