Re: [PATCH V3 1/6] sched: idle: Add a weak arch_cpu_idle_poll function

From: Peter Zijlstra
Date: Mon Nov 10 2014 - 10:17:46 EST


On Mon, Nov 10, 2014 at 07:50:22PM +0530, Preeti U Murthy wrote:
> Hi Peter,
>
> On 11/10/2014 05:59 PM, Peter Zijlstra wrote:
> > On Fri, Nov 07, 2014 at 03:31:22PM +0100, Daniel Lezcano wrote:
> >> The poll function is called when a timer expired or if we force to poll when
> >> the cpu_idle_force_poll option is set.
> >>
> >> The poll function does:
> >>
> >> local_irq_enable();
> >> while (!tif_need_resched())
> >> cpu_relax();
> >>
> >> This default poll function suits for the x86 arch because its rep; nop;
> >> hardware power optimization. But on other archs, this optimization does not
> >> exists and we are not saving power. The arch specific bits may want to
> >> optimize this loop by adding their own optimization.
> >
> > This doesn't make sense to me; should an arch not either implement an
> > actual idle driver or implement cpu_relax() properly, why allow for a
> > third weird option?
> >
>
> The previous version of this patch simply invoked cpu_idle_loop() for
> cases where latency_req was 0. This would have changed the behavior
> on PowerPC wherein earlier the 0th idle index was returned which is also
> a polling loop but differs from cpu_idle_loop() in two ways:
>
> a. It polls at a relatively lower power state than cpu_relax().
> b. We set certain registers to indicate that the cpu is idle.

So I'm confused; the current code runs the generic cpu_relax idle poll
loop for the broadcast case. I suppose you want to retain this because
not doing your a-b above will indeed give you a lower latency.

Therefore one could argue that latency_req==0 should indeed use this,
and your a-b idle state should be latency_req==1 or higher.

Thus yes it changes behaviour, but I think it actually fixes something.
You cannot have a latency_req==0 state which has higher latency than the
actual polling loop, as you appear to have.

> Hence for all such cases wherein the cpu is required to poll while idle
> (only for cases such as force poll, broadcast ipi to arrive soon and
> latency_req = 0), we should be able to call into cpuidle_idle_loop()
> only if the cpuidle driver's 0th idle state has an exit_latency > 0.
> (The 0th idle state is expected to be a polling loop with
> exit_latency = 0).
>
> If otherwise, it would mean the driver has an optimized polling loop
> when idle. But instead of adding in the logic of checking the
> exit_latency, we thought it would be simpler to call into an arch
> defined polling idle loop under the above circumstances. If that is no
> better we could fall back to cpuidle_idle_loop().

That still doesn't make sense to me; suppose the implementation of this
special poll state differs on different uarchs for the same arch, then
we'll end up with another registration and selection interface parallel
to cpuidle.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/