Re: [PATCH V3 2/2] sched: idle: IRQ based next prediction for idle period

From: Rafael J. Wysocki
Date: Fri Feb 19 2016 - 18:44:03 EST


On Fri, Feb 19, 2016 at 4:01 PM, Daniel Lezcano
<daniel.lezcano@xxxxxxxxxx> wrote:
> On 02/18/2016 07:57 PM, Rafael J. Wysocki wrote:
>>
>> On Thu, Feb 18, 2016 at 11:25 AM, Daniel Lezcano
>> <daniel.lezcano@xxxxxxxxxx> wrote:
>>>
>>> On 02/17/2016 11:21 PM, Rafael J. Wysocki wrote:
>>>
>>> [ ... ]
>>>
>>>>>> Reviewed-by: Nicolas Pitre <nico@xxxxxxxxxx>
>>>>>
>>>>>
>>>>>
>>>>> Well, I'm likely overlooking something, but how is this going to be
>>>>> hooked up to the code in idle.c?
>>>>
>>>>
>>>>
>>>> My somewhat educated guess is that sched_idle() in your patch is
>>>> intended to replace cpuidle_idle_call(), right?
>>>
>>>
>>>
>>> Well, no. I was planning to first have it to use a different code path as
>>> experimental code in order to focus improving the accuracy of the
>>> prediction
>>> and then merge or replace cpuidle_idle_call() with sched_idle().
>>
>>
>> In that case, what about making it a proper cpuidle governor that
>> people can test and play with in a usual way? Then it may potentially
>> benefit everybody and not just your experimental setup and you may get
>> coverage on systems you have no access to normally.
>>
>> There is some boilerplate code to add for this purpose, but that's not
>> that bad IMO.
>
>
> Hi Rafael,
>
> sorry for the delay in the responses.
>
> Actually, adding a new governor is precisely what I would like to avoid
> because the objective is the scheduler acts as the governor.

But why, really?

Well, first of all I'm not sure what "the scheduler acts as the
governor" means. For the lack of a better explanation I'll refer to
the message at https://lkml.org/lkml/2016/1/12/530 that you pointed me
at.

Your code in there does something like:

if (sched_idle_enabled()) {
int latency = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
s64 duration = sched_idle_next_wakeup();
sched_idle(duration, latency);
} else {
cpuidle_idle_call();
}

which is quite questionable to be honest as it adds an extra branch to
the idle loop for no real benefit.

Now, what really is the difference between "governor" and "predictor"?
I don't quite see it except that the former is expected to provide a
specific interface.

The way the idle loop works now (and I'm not sure if you can really
change it) is that when you get into it, you're idle no matter what
and you simply need to choose an idle state for the CPU to go into.
Some code needs to select that state, regardless of what name you want
to give to that code.

In the current setup, which I really don't think is unreasonable, this
is done by cpuidle_select() that simply invokes the governor's
->select() callback and that's it. That callback may very well be
part of the scheduler and registered from there if you want that, but
why do you want to change the whole mechanism? What's wrong with it
now?

Further, if you look at your sched_idle(), it looks almost like
cpuidle_idle_call() with a few really minor differences (apart from
the fact that it doesn't cover suspend-to-idle which it will have to
do eventually) that really look arbitrary and the "selection" if () in
it simply plays the role of the invocation of ->select(). So how is
it different really?

> Here, it is the 'predictor' and the API to enter an idle state conforming the idle duration
> and the latency constraint.

Isn't that just a simple rearrangement of the code? The latency still
comes from PM QoS and the duration is computed by your new code
instead of that being done by ->select() itself, but why actually
->select() cannot call sched_idle_next_wakeup() to get the duration
value it needs? Why do those values need to be passed to a
cpuidle_idle_call() replacement as arguments? Is there any particular
technical reason for doing that?

And why that name, sched_idle_next_wakeup()? Does that function
really have anything to do with the scheduler now?

> Concerning the testing, it is quite easy to switch from idle_sched to 'menu'
> via on sched_debug or whatever option we want to add.
>
>>
>> So I'm still unsure why you want to replace cpuidle_idle_call() with
>> sched_idle(). Is there anything wrong with it that it needs to be
>> replaced?
>
>
> I don't want to replace cpuidle_idle_call() with sched_idle(). How we
> integrate the API is something I would like to discuss with another patchset
> focused in this integration only.
>
> For reference: https://lkml.org/lkml/2016/1/12/530

Please answer my questions above. If you need to post a patchset for
this purpose, please do that.

I have to say that I was looking forward to the IRQ timings based
duration prediction, but the way you want to use it now is seriously
disappointing.

Thanks,
Rafael