Re: Observing RCU stalls in kernel 5.4/5.10/5.15/6.1 stable trees

From: Mathias Krause
Date: Wed Jun 14 2023 - 10:30:10 EST


On 14.06.23 15:57, Luiz Capitulino wrote:
> On 2023-06-14 09:45, Sven-Haegar Koch wrote:
>> May be this, talking about the same commit as cause as this thread:
>>
>> Subject: Re: [PATCH] timekeeping: Align tick_sched_timer() with the HZ
>> tick. -- regression report
>> https://lore.kernel.org/lkml/5a56290d-806e-b9a5-f37c-f21958b5a8c0@xxxxxxxxxxxxxx/
>
> Thank you, Sven.
>
> Sebastian, except for the detailed analysis which we haven't done yet, the
> issue described by Mathias matches 100% what we're observing. Also, we do
> observe this on bare-metal instances which could mean that the initial
> reports are against VMs because those are rebooted more often (our quick
> reproducer boots hundreds of instances in AWS and only 1 or 2 reproduces
> this).

Yeah, we're doing VM based testing more often than bare metal -- less so
on a AWS scale. That's why we observed it first in VMs. But that wasn't
meant to exclude bare metal, not at all. It's just, that we haven't
tried hard enough yet and testing VMs is so much more pleasant when it
comes to debugging boot issues ;)

Thanks,
Mathias

> IMHO, I'd suggest we revert this for now from Linus tree and stable trees.
> We can help testing for the fix maybe for the next merge window.
>
> - Luiz
>