Re: [PATCH RFC] x86/entry: Ask RCU if it needs rcu_irq_{enter,exit}()

From: Thomas Gleixner
Date: Fri Jun 12 2020 - 08:40:58 EST


Andy Lutomirski <luto@xxxxxxxxxx> writes:
>
> This is saying we were idle and __rcu_is_watching() correctly returned
> false. We got sysvec_apic_timer_interrupt(), which did
> rcu_irq_enter() and then turned on IRQs for softirq processing. Then
> we got sysvec_call_function_single() inside that, and
> sysvec_call_function_single() noticed that RCU was already watching
> and did *not* call rcu_irq_enter(). And, if
> sysvec_call_function_single() does rcu_is_cpu_rrupt_from_idle(), it
> will return true. So the issue is that RCU thinks that, since it
>
>
>> +static __always_inline bool rcu_needs_irq_enter(void)
>> +{
>> + return !IS_ENABLED(CONFIG_TINY_RCU) &&
>> + (context_tracking_enabled_cpu(smp_processor_id()) || is_idle_task(current));
>> +}
>
> x86 shouldn't need this context tracking check -- we won't even call
> this if we came from user mode, and we make sure we never run with
> IRQs on when we're in CONTEXT_USER.

As I told Paul already, it's broken.

> I think my preference would be to fix this with something more like
> tglx's patch but with an explanation:

Yes, explanation is definitely required :)

> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index f0b657097a2a..93fd9d6fe033 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -571,7 +571,7 @@ bool noinstr idtentry_enter_cond_rcu(struct pt_regs *regs)
> return false;
> }
>
> - if (!__rcu_is_watching()) {
> + if (!__rcu_is_watching() || is_idle_task(current)) {

Actually that __rcu_is_watching() check is pointless because this can
only return false when current is the idle task. If the entry came from
user mode then this path is not taken. Entry from user mode with NOHZ
full does:

enter_from_user_mode()
user_exit_irqoff()
__context_tracking_exit(CONTEXT_USER)
rcu_user_exit()
rcu_eqs_exit(1)
WRITE_ONCE(rdp->dynticks_nmi_nesting, DYNTICK_IRQ_NONIDLE);

So if an interrupt hits the kernel after enabling interrupts then:

1) RCU is watching and out of EQS
2) The dynticks_nmi_nesting counter is not longer relevant

That remains that way until returning to user or scheduling out to idle
which means:

if (is_idle_task(current))

is completely sufficient. And we don't care about unconditional
rcu_irq_enter() in this case. If idle triggers a #PF which wants to
sleep then the RCU state is the least of our worries.

Thanks,

tglx