Re: [PATCH v5 6/9] slub: Delay freezing of partial slabs

From: Vlastimil Babka
Date: Wed Nov 22 2023 - 03:52:53 EST


On 11/21/23 19:21, Mark Brown wrote:
> On Tue, Nov 21, 2023 at 11:47:26PM +0800, Chengming Zhou wrote:
>
>> Ah yes, there is no NMI on ARM, so CPU 3 maybe running somewhere with
>> interrupts disabled. I searched the full log, but still haven't a clue.
>> And there is no any WARNING or BUG related to SLUB in the log.
>
> Yeah, nor anything else particularly. I tried turning on some debug
> options:
>
> CONFIG_SOFTLOCKUP_DETECTOR=y
> CONFIG_DETECT_HUNG_TASK=y
> CONFIG_WQ_WATCHDOG=y
> CONFIG_DEBUG_PREEMPT=y
> CONFIG_DEBUG_LOCKING=y
> CONFIG_DEBUG_ATOMIC_SLEEP=y
>
> https://validation.linaro.org/scheduler/job/4017828
>
> which has some additional warnings related to clock changes but AFAICT
> those come from today's -next rather than the debug stuff:
>
> https://validation.linaro.org/scheduler/job/4017823
>
> so that's not super helpful.

For the record (and to help debugging focus) on IRC we discussed that with
CONFIG_SLUB_CPU_PARTIAL=n the problem persists:
https://validation.linaro.org/scheduler/job/4017863
Which limits the scope of where to look so that's good :)

>> I wonder how to reproduce it locally with a Qemu VM since I don't have
>> the ARM machine.
>
> There's sample qemu jobs available from for example KernelCI:
>
> https://storage.kernelci.org/next/master/next-20231120/arm/multi_v7_defconfig/gcc-10/lab-baylibre/baseline-qemu_arm-virt-gicv3.html
>
> (includes the command line, though it's not using Debian testing like my
> test was). Note that I'm testing a bunch of platforms with the same
> kernel/rootfs combination and it was only the Raspberry Pi 3 which blew
> up. It is a bit tight for memory which might have some influence?
>
> I'm really suspecting this may have made some underlying platform bug
> more obvious :/