Re: [RFC PATCH 3/3] idle: store the idle state index in the structrq

From: Nicolas Pitre
Date: Fri Jan 31 2014 - 13:19:36 EST

Next message: Rob Herring: "Re: [PATCH] of: add vendor prefix for Honeywell"
Previous message: Thomas Glanzmann: "Re: [ovs-discuss] Linus GIT Head OOPs reproducable in open vswitchwhen running mininet topology"
In reply to: Arjan van de Ven: "Re: [RFC PATCH 3/3] idle: store the idle state index in the structrq"
Next in thread: Daniel Lezcano: "Re: [RFC PATCH 3/3] idle: store the idle state index in the structrq"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, 31 Jan 2014, Arjan van de Ven wrote:

> On 1/31/2014 7:37 AM, Daniel Lezcano wrote:
> > On 01/31/2014 04:07 PM, Arjan van de Ven wrote:
> > > > > >
> > > > > > Hence I think this patch would make sense only with additional
> > > > > > information
> > > > > > like exit_latency or target_residency is present for the scheduler.
> > > > > > The idle
> > > > > > state index alone will not be sufficient.
> > > > >
> > > > > Alternatively, can we enforce sanity on the cpuidle infrastructure to
> > > > > make the index naturally ordered? If not, please explain why :-)
> > > >
> > > > The commit id 71abbbf856a0e70 says that there are SOCs which could have
> > > > their target_residency and exit_latency values change at runtime. This
> > > > commit thus removed the ordering of the idle states according to their
> > > > target_residency/exit_latency. Adding Len and Arjan to the CC.
> > >
> > > the ARM folks wanted a dynamic exit latency, so.... it makes much more
> > > sense
> > > to me to store the thing you want to use (exit latency) than the number
> > > of the state.
> > >
> > > more than that, you can order either by target residency OR by exit
> > > latency,
> > > if you sort by one, there is no guarantee that you're also sorted by the
> > > other
> >
> > IMO, it would be preferable to store the index for the moment as we are
> > integrating cpuidle with the scheduler. The index allows to access more
> > informations. Then when
> > everything is fully integrated we can improve the result, no ?
>
> more information, yes. but if the information isn't actually accurate (because
> it keeps changing
> in the datastructure away from what it was for the cpu)... are you really
> achieving what you want?

Right now (on ARM at least but I imagine this is pretty universal), the
biggest impact on information accuracy for a CPU depends on what the
other CPUs are doing. The most obvious example is cluster power down.
For a cluster to be powered down, all the CPUs sharing this cluster must
also be powered down. And all those CPUs must have agreed to a possible
cluster power down in advance as well. But it is not because an idle
CPU has agreed to the extra latency imposed by a cluster power down that
the cluster has actually powered down since another CPU in that cluster
might still be running, in which case the recorded latency information
for that idle CPU would be higher than it would be in practice at that
moment.

A cluster should map naturally to a scheduling domain. If we need to
wake up a CPU, it is quite obvious that we should prefer an idle CPU
from a scheduling domain which load is not zero. If the load is not
zero then this means that any idle CPU in that domain, even if it
indicated it was ready for a cluster power down, will not require the
cluster power-up latency as some other CPUs must still be running. But
we already know that of course even if the recorded latency might not
say so.

In other words, the hardware latency information is dynamic of course.
But we might not _need_ to have it reflected at the scheduler domain all
the time as in this case it can be inferred by the scheduling domain
load.

Within a scheduling domain it is OK to pick up the best idle CPU by
looking at the index as it is best to leave those CPUs ready for a
cluster power down set to that state and prefer one which is not. And a
scheduling domain with a load of zero should be left alone if idle CPUs
are found in another domain which load is not zero, irrespective of
absolute latency information. So all the existing heuristics already in
place to optimize cache utilization and so on will make things just work
for idle as well.

All this to say that it is not justified at the moment to worry about
how to convey the full details to the scheduler and the complexity that
goes with it since in practice we might be able to achieve our goal just
as well using simpler hints like some arbitrary index. Once this is in
place, then we could look at the actual benefits from having more
detailed information and weight that against the complexity that comes
with it.

Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Rob Herring: "Re: [PATCH] of: add vendor prefix for Honeywell"
Previous message: Thomas Glanzmann: "Re: [ovs-discuss] Linus GIT Head OOPs reproducable in open vswitchwhen running mininet topology"
In reply to: Arjan van de Ven: "Re: [RFC PATCH 3/3] idle: store the idle state index in the structrq"
Next in thread: Daniel Lezcano: "Re: [RFC PATCH 3/3] idle: store the idle state index in the structrq"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]