Re: [PATCH 1/3] cpuidle,x86: increase forced cut-off for polling to 20us

From: Daniel Lezcano
Date: Thu Oct 29 2015 - 09:02:30 EST


On 10/29/2015 12:54 PM, Rik van Riel wrote:
On 10/29/2015 06:17 AM, Daniel Lezcano wrote:
On 10/28/2015 11:46 PM, riel@xxxxxxxxxx wrote:
From: Rik van Riel <riel@xxxxxxxxxx>

The cpuidle menu governor has a forced cut-off for polling at 5us,
in order to deal with firmware that gives the OS bad information
on cpuidle states, leading to the system spending way too much time
in polling.

May be I am misunderstanding your explanation but it is not how I read
the code.

The default idle state is C1 (hlt) if no other states suits the
constraint. If a timer is happening really soon, then set the default
idle state to POLL if no other idle state suits the constraint.

That applies only on x86.

With the current code, the default idle state is C1 (hlt) even if
C1 does not suit the constraint.

This is not related to break-even but exit latency.

Why would we not care about break-even for C1?

On systems where going into C1 for too-short periods wastes
power, why would we waste the power when we expect a very
short sleep?

IMO, we should just drop this 5us and the POLL state selection in the
menu governor as we have since a while hyper fast C1 exit. Except a few
embedded processors where polling is not adequate.

We have hyper fast C1 exit on Nehalem and newer high performance
chips. On those chips, we will pick C1 (or deeper) when we have
an expected sleep time of just a few microseconds.

However, on Atom, and for the paravirt cpuidle driver I am
working on, C1 exit latency and target residence are higher
than the cut-off hardcoded in the menu governor.

Furthermore, the number of times the poll state is selected vs the other
states is negligible.

And it will continue to be with this patch, on CPUs with
hyper fast C1 exit.

Which makes me confused about what your are objecting to,
since the system should continue to be have the way you want,
with the patch applied.

Ok, I don't object the correctness of your patch but the reasoning behind this small optimization which bring us a lot of mess in the cpuidle code.

As you are touching this part of the code, I take the opportunity to raise a discussion about it.

From my POV, the poll state is *not* an idle state. It is like a vehicle burnout [1].

But it is inserted into the idle state tables using a trick with a macro CPUIDLE_DRIVER_STATE_START which already led us to some bugs.

So instead of falling back into the poll state under certain circumstances, I propose we extract this state from the idle state table and we let the menu governor to fail choosing a state (or not).

From the caller, we decide what to do (poll or C1) if the idle state selection fails or we choose to poll *before* like what we already have in kernel/sched/idle.c:

in the idle loop:

if (cpu_idle_force_poll || tick_check_broadcast_expired())
cpu_idle_poll();
else
cpuidle_idle_call();

By this way, we:

1) factor out the idle state selection with the find_deepest_idle_state
2) remove the CPUIDLE_DRIVER_STATE_START macro
3) concentrate the optimization logic outside of a governor which will benefit to all architectures

Does it make sense ?

-- Daniel

[1] https://en.wikipedia.org/wiki/Burnout_%28vehicle%29


--
<http://www.linaro.org/> Linaro.org â Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/