Re: [PATCH x86/entry: Force rcu_irq_enter() when in idle task

From: Paul E. McKenney
Date: Fri Jun 12 2020 - 15:19:31 EST


On Fri, Jun 12, 2020 at 10:49:53AM -0700, Paul E. McKenney wrote:
> On Fri, Jun 12, 2020 at 03:55:00PM +0200, Thomas Gleixner wrote:
> > The idea of conditionally calling into rcu_irq_enter() only when RCU is
> > not watching turned out to be not completely thought through.
> >
> > Paul noticed occasional premature end of grace periods in RCU torture
> > testing. Bisection led to the commit which made the invocation of
> > rcu_irq_enter() conditional on !rcu_is_watching().
> >
> > It turned out that this conditional breaks RCU assumptions about the idle
> > task when the scheduler tick happens to be a nested interrupt. Nested
> > interrupts can happen when the first interrupt invokes softirq processing
> > on return which enables interrupts. If that nested tick interrupt does not
> > invoke rcu_irq_enter() then the nest accounting in RCU claims that this is
> > the first interrupt which might mark a quiescient state and end grace
> > periods prematurely.
>
> For this last sentence, how about the following?
>
> If that nested tick interrupt does not invoke rcu_irq_enter() then the
> RCU's irq-nesting checks will believe that this interrupt came directly
> from idle, which will cause RCU to report a quiescent state. Because
> this interrupt instead came from a softirq handler which might have
> been executing an RCU read-side critical section, this can cause the
> grace period to end prematurely.
>
> > Change the condition from !rcu_is_watching() to is_idle_task(current) which
> > enforces that interrupts in the idle task unconditionally invoke
> > rcu_irq_enter() independent of the RCU state.
> >
> > This is also correct vs. user mode entries in NOHZ full scenarios because
> > user mode entries bring RCU out of EQS and force the RCU irq nesting state
> > accounting to nested. As only the first interrupt can enter from user mode
> > a nested tick interrupt will enter from kernel mode and as the nesting
> > state accounting is forced to nesting it will not do anything stupid even
> > if rcu_irq_enter() has not been invoked.
>
> On the testing front, just like with my busted patch yesterday, this
> patch breaks the TASKS03 rcutorture scenario by preventing the Tasks
> RCU grace periods from ever completing. However, this is an unusual
> configuration with NO_HZ_FULL and one CPU actually being nohz_full.
> The more conventional TASKS01 and TASKS02 scenarios do just fine.
>
> I will therefore address this issue in a follow-on patch.

I should add that -your- patch from yesterday did -not- cause this
problem, in case that is of interest.

Thanx, Paul

> > Fixes: 3eeec3858488 ("x86/entry: Provide idtentry_entry/exit_cond_rcu()")
> > Reported-by: "Paul E. McKenney" <paulmck@xxxxxxxxxx>
> > Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>
> Reviewed-by: "Paul E. McKenney" <paulmck@xxxxxxxxxx>
> Tested-by: "Paul E. McKenney" <paulmck@xxxxxxxxxx>
>
> > ---
> > arch/x86/entry/common.c | 35 ++++++++++++++++++++++++++++-------
> > 1 file changed, 28 insertions(+), 7 deletions(-)
> > --- a/arch/x86/entry/common.c
> > +++ b/arch/x86/entry/common.c
> > @@ -557,14 +557,34 @@ bool noinstr idtentry_enter_cond_rcu(str
> > return false;
> > }
> >
> > - if (!__rcu_is_watching()) {
> > + /*
> > + * If this entry hit the idle task invoke rcu_irq_enter() whether
> > + * RCU is watching or not.
> > + *
> > + * Interupts can nest when the first interrupt invokes softirq
> > + * processing on return which enables interrupts.
> > + *
> > + * Scheduler ticks in the idle task can mark quiescent state and
> > + * terminate a grace period, if and only if the timer interrupt is
> > + * not nested into another interrupt.
> > + *
> > + * Checking for __rcu_is_watching() here would prevent the nesting
> > + * interrupt to invoke rcu_irq_enter(). If that nested interrupt is
> > + * the tick then rcu_flavor_sched_clock_irq() would wrongfully
> > + * assume that it is the first interupt and eventually claim
> > + * quiescient state and end grace periods prematurely.
> > + *
> > + * Unconditionally invoke rcu_irq_enter() so RCU state stays
> > + * consistent.
> > + *
> > + * TINY_RCU does not support EQS, so let the compiler eliminate
> > + * this part when enabled.
> > + */
> > + if (!IS_ENABLED(CONFIG_TINY_RCU) && is_idle_task(current)) {
> > /*
> > * If RCU is not watching then the same careful
> > * sequence vs. lockdep and tracing is required
> > * as in enter_from_user_mode().
> > - *
> > - * This only happens for IRQs that hit the idle
> > - * loop, i.e. if idle is not using MWAIT.
> > */
> > lockdep_hardirqs_off(CALLER_ADDR0);
> > rcu_irq_enter();
> > @@ -576,9 +596,10 @@ bool noinstr idtentry_enter_cond_rcu(str
> > }
> >
> > /*
> > - * If RCU is watching then RCU only wants to check
> > - * whether it needs to restart the tick in NOHZ
> > - * mode.
> > + * If RCU is watching then RCU only wants to check whether it needs
> > + * to restart the tick in NOHZ mode. rcu_irq_enter_check_tick()
> > + * already contains a warning when RCU is not watching, so no point
> > + * in having another one here.
> > */
> > instrumentation_begin();
> > rcu_irq_enter_check_tick();