Re: [PATCH] tick: Detect and fix jiffies update stall

From: Paul E. McKenney
Date: Wed Feb 02 2022 - 12:24:45 EST


On Wed, Feb 02, 2022 at 03:19:51AM +0100, Frederic Weisbecker wrote:
> On Tue, Feb 01, 2022 at 05:49:34PM -0800, Paul E. McKenney wrote:
> > On Wed, Feb 02, 2022 at 01:01:07AM +0100, Frederic Weisbecker wrote:
> > > On some rare cases, the timekeeper CPU may be delaying its jiffies
> > > update duty for a while. Known causes include:
> > >
> > > * The timekeeper is waiting on stop_machine in a MULTI_STOP_DISABLE_IRQ
> > > or MULTI_STOP_RUN state. Disabled interrupts prevent from timekeeping
> > > updates while waiting for the target CPU to complete its
> > > stop_machine() callback.
> > >
> > > * The timekeeper vcpu has VMEXIT'ed for a long while due to some overload
> > > on the host.
> > >
> > > Detect and fix these situations with emergency timekeeping catchups.
> > >
> > > Original-patch-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
> > > Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
> > > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> >
> > Nice, thank you!
> >
> > So I should revert your earlier patch, apply this one, and then test
> > the result?
>
> No need to revert the nohz_full fix, this new one deals with non-dynticks
> issues. This way we cover every timekeeper stall situations:
>
> _ dynticks-idle is handled on IRQ entry
>
> _ full dynticks is handled on IRQ entry in case of CPU 0 (traditional nohz_full
> timekeeper) timekeeping stall. Let's hope we won't need to handle syscalls and
> faults as well but we'll see...
>
> _ periodic ticks are now handled on the tick.
>
> So you just need to apply this patch on your dev branch for testing.

I have pulled it in, thank you! I will beat on it.

I am guessing that this goes up some other path to mainline, so I have
marked it "EXP".

Thanx, Paul