Re: Observing RCU stalls in kernel 5.4/5.10/5.15/6.1 stable trees

From: Luiz Capitulino
Date: Wed Jun 14 2023 - 09:58:16 EST




On 2023-06-14 09:45, Sven-Haegar Koch wrote:




On Wed, 14 Jun 2023, Luiz Capitulino wrote:

On 2023-06-14 05:20, Sebastian Andrzej Siewior wrote:

On 2023-06-14 11:14:49 [+0200], gregkh@xxxxxxxxxxxxxxxxxxx wrote:
Oops, missed this.

Yes, there might be, can you do 'git bisect' and track down the patch
that fixed this?

There was a report of a lockup during boot in VMs yesterday. If I
remember correctly this still exists and might be related to this
report. I'm going to have a look.

Thanks, Sebastian. Do you have a link for the discussion?

May be this, talking about the same commit as cause as this thread:

Subject: Re: [PATCH] timekeeping: Align tick_sched_timer() with the HZ
tick. -- regression report
https://lore.kernel.org/lkml/5a56290d-806e-b9a5-f37c-f21958b5a8c0@xxxxxxxxxxxxxx/

Thank you, Sven.

Sebastian, except for the detailed analysis which we haven't done yet, the
issue described by Mathias matches 100% what we're observing. Also, we do
observe this on bare-metal instances which could mean that the initial
reports are against VMs because those are rebooted more often (our quick
reproducer boots hundreds of instances in AWS and only 1 or 2 reproduces this).

IMHO, I'd suggest we revert this for now from Linus tree and stable trees.
We can help testing for the fix maybe for the next merge window.

- Luiz


May not have been the best idea to respond with such big analysis to a 3
months old dead thread, gets lost extremely easy.

c'ya
sven-haegar

--
Three may keep a secret, if two of them are dead.
- Ben F.