Re: [PATCH v5 tip/core/rcu 15/16] rcu: Make RCU-tasks wait for idle tasks

From: Paul E. McKenney
Date: Wed Aug 13 2014 - 14:20:43 EST


On Wed, Aug 13, 2014 at 04:42:19PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 13, 2014 at 07:12:17AM -0700, Paul E. McKenney wrote:
> > > That's not an excuse for doing horrible things. And inventing new infra
> > > that needs to wake all CPUs is horrible.
> >
> > Does your patch even work?
>
> Haven't even tried compiling it, but making it work shouldn't be too
> hard.
>
> > Looks like it should, and yes, the idle loop
> > seems quite a bit simpler than it was a few years ago, but we really
> > don't need some strange thing that leaves a CPU idle but not visible as
> > such to RCU.
>
> There's slightly more to it though; things like the x86 mwait idle wait
> functions tend to do far too much; for instance look at:
>
> drivers/idle/intel_idle.c:intel_idle()

OK, let's see if I can follow the idly bouncing ball.

Looks like most arches call cpu_startup_entry(), which calls
arch_cpu_idle_prepare() followed by cpu_idle_loop().

arch_cpu_idle_prepare() is local_fiq_enable() on ARM and empty elsewhere.

cpu_idle_loop() does tick_nohz_idle_enter(), which does some nohz stuff.
It also checks for offline, and invokes arch_cpu_idle_enter(), which
is empty except on x86 and ARM. On ARM, it messes with LEDs, and on
x86 it appears to disable an NMI-based watchdog timer.
Interrupts are disabled, and we do either cpu_idle_poll() if specified
or cpuidle_idle_call(). cpu_idle_poll() is the old-time idle loop,
with rcu_idle_enter()/_exit() and enabling interrupts prior to spinning.
No sign of stop_critical_timings(), though. So let's not bury
rcu_idle_enter() in stop_critical_timings().

cpuidle_idle_call() does a fastpath irq-enable/exit if need-resched,
then does stop_critical_timings() and rcu_idle_enter(). Then we
have the buried complexity with cpuidle_select(), but a negative
return says to check need-resched and enable interrupts or to
invoke arch_cpu_idle(), which executes various sleep instructions
on various architectures. Some notable variants:

o ARM has an arm_pm_idle() function pointer so that different
SoCs can have different idle-power-down sequences. Alternatively,
some SoCs have functions named CPUNAME_do_idle(), for example,
imx5_cpu_do_idle(). These seem to invoke processor.do_idle().

Pushing rcu_idle_enter() and rcu_idle_exit() down below
arch_cpu_idle() on ARM looks to be asking for it.

o ARM64 does cpu_do_idle, which does a wait-for-interrupt instruction.

o AVR calls into assembly, hand-coding the need-resched check.
Not sure why that would still be needed.

o CRIS has one function that just enables interrupts and returns,
and anohter that enables interrupts and halts.

o UM times the duration of the idle time based on what appears to
be the time until the next event.

o Unicore does a string of what appear to be no-ops.

o x86 does a couple of levels of indirection, one being the
x86_idle() function pointer and another being the safe_halt()
function pointer.

amd_e400_idle() is a bit ornate, but still does default_idle()
which wrappers the safe_halt() pointer.

And various other architectures seem to work similarly, but lots of
hair here. So Steven, you OK with the underlying arch_cpu_idle()
functions being off-limits to tracing?

Now, if cpuidle_select() returns non-negative, we are dealing with
the CPU-idle governor, which is invoked at the later cpuidle_enter().

Hmmm... On the CPU-idle drivers...

o apm_idle_driver puts the idle loop into the ->enter() function,
apm_cpu_idle().

o ACPI puts the idle loop in acpi_idle_do_entry(), and does call
stop_critical_timings(), but not rcu_idle_enter().
So presumably stop_critical_timings() can nest? Not clear
from the code.

o The CPS driver is even stranger... Is cps_gen_entry_code()
really depositing assembly instructions into a buffer that is
passed back as a function?

o The intel_idle driver is the one with mwait_idle_with_hints(),
so you covered it below.

Your patch covers the cpuidle_enter() transition, which means
that functions like cpuidle_enter(), acpi_idle_enter_c1(), and
acpi_idle_do_entry() would be off-limits to trampolining. In the case
of CPS, quite a bit of code.

> We should push the rcu_idle_{enter,exit}() down to around
> mwait_idle_with_hints(), so we don't call half the word with RCU
> disabled.

That would be for the intel_idle.c CPU-idle driver. The other drivers
also need rcu_idle_{enter,exit}().

> > I have already said that I will be happy to rip out the wakeup code
> > when it is no longer needed, and I agree that it would be way better if
> > not needed.
>
> I'd prefer to dtrt now and not needing to fix it later.

Once it works, I might consider it "right" and adjust accordingly.
At the moment, speculation.

> Auditing all idle functions will be somewhat of a pain, but its entirely
> doable. Looking at this stuff, it appears we can clean it up massively;
> see how the generic cpuidle code already has the broadcast logic in, so
> we can remove that from the drivers by setting the right flags.

There is certainly quite a bit of hair in a number of these drivers,
no two ways about it.

> We can similarly pull out the leave_mm() call by adding a
> CPUIDLE_FLAG_TLB_FLUSH. At which point all we'd need to do is mark the
> intel_idle (and all other cpuidle_state::enter functions with __notrace.

That one seems to be specific to intel_idle. But yes, nice to avoid
waking an idle CPU for TLB flushes.

> > But I won't base a patch on hypotheticals. You have already
> > drawn way too much water from -that- well over the past years! ;-)
>
> not entirely sure what you're referring to there ;-)

Heh!

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/