Re: [PATCH] cpuidle: Allow configuration of the polling interval before cpuidle enters a c-state

From: Mel Gorman
Date: Fri Nov 27 2020 - 05:53:28 EST


On Thu, Nov 26, 2020 at 08:31:51PM +0000, Mel Gorman wrote:
> > > and it is reasonable behaviour but it should be tunable.
> >
> > Only if there is no way to cover all of the relevant use cases in a
> > generally acceptable way without adding more module params etc.
> >
> > In this particular case, it should be possible to determine a polling
> > limit acceptable to everyone.
> >
>
> Potentially yes. cpuidle is not my strong suit but it could try being
> adaptive the polling similar to how the menu governor tries to guess
> the typical interval. Basically it would have to pick a polling internal
> between 2 and TICK_NSEC. Superficially it a task is queued before polling
> finishes, decrease the interval and increase it otherwise. That is a mess
> though because then it may be polling for ages with nothing arriving. It
> would have to start tracking when the CPU exited idle to see if polling
> is even worthwhile. That
>
> I felt that starting with anything that tried adapting the polling
> interval based on heuristics would meet higher resistance than making it
> tunable. Hence, make it tunable so at least the problem can be addressed
> when it's encountered.
>

I looked at this again and determining a "polling limit acceptable
to everyone" looks like reimplementing haltpoll in the core or adding
haltpoll-like logic to each governor. I doubt that'll be a popular
approach.

The c1 exit latency as a hint is definitely too low though. I checked
one of the test machines to double check what the granularity of the time
checks in poll_idle() at boot time with something like this.

for (i = 0; i < POLL_IDLE_RELAX_COUNT; i++) {
cpu_relax();
}

This takes roughly 1100ns on a test machine where the C1 exit latency is
2000ns. Lets say you have a basic pair of tasks communicating over a pipe
on the same machine (e.g. perf bench pipe). The time for a round-trip on
the same machine is roughly 7000ns meaning that polling is almost never
useful for a basic workload.


--
Mel Gorman
SUSE Labs