Re: Prototype patch to avoid TREE07 rcu_torture_writer() stalls

From: Paul E. McKenney
Date: Sat Jan 06 2024 - 19:49:56 EST


On Sat, Jan 06, 2024 at 11:36:03PM +0100, Frederic Weisbecker wrote:
> Le Sat, Jan 06, 2024 at 06:55:14AM -0800, Paul E. McKenney a écrit :
> > > Is this related?
> > >
> > > But then the system picks itself up, dusts itself off, and goes along
> > > as if nothing had happened.
> > >
> > > Maybe a long-running IRQ, NMI, or SMI?
> >
> > Or, based on a recent bug chase of another type, high contention on
> > an IRQ-disabled spinlock?
>
> Before checking the guest's dmesg, I should probably have checked the host's.
> It seems to report some softlockups, perhaps due to too many instances
> in parallel where memory is not that generous.

That would do it!

> Let me try to run as much time (250 hours) but with fewer instances in
> parallel.

I just today saw an extended stall on one instance of TREE03, also
RCU grace-period kthread starvation. But this was in -next, which
is also having other yet-as-unanalyzed issues.

Thanx, Paul