Re: [RFC PATCH v3 0/6] sched/cpufreq: Make schedutil energy aware

From: Peter Zijlstra
Date: Fri Oct 18 2019 - 08:07:37 EST


On Fri, Oct 18, 2019 at 12:46:25PM +0100, Douglas Raillard wrote:

> > What I don't see is how that that difference makes sense as input to:
> >
> > cost(x) : (1 + x) * cost_j
>
> The actual input is:
> x = (EM_COST_MARGIN_SCALE/SCHED_CAPACITY_SCALE) * (util - util_est)
>
> Since EM_COST_MARGIN_SCALE == SCHED_CAPACITY_SCALE == 1024, this factor of 1
> is not directly reflected in the code but is important for units
> consistency.

But completely irrelevant for the actual math and conceptual
understanding. Just because computers suck at real numbers, and floats
are expensive, doesn't mean we have to burden ourselves with fixed point
when writing equations.

Also, as a physicist I'm prone to normalizing everything to 1, because
that's lazy.

> > I suppose that limits the additional OPP to twice the previously
> > selected cost / efficiency (see the confusion from that other email).
> > But given that efficency drops (or costs rise) for higher OPPs that
> > still doesn't really make sense..

> Yes, this current limit to +100% freq boosting is somehow arbitrary and
> could probably benefit from being tunable in some way (Kconfig option
> maybe). When (margin > 0), we end up selecting an OPP that has a higher cost
> than the one strictly required, which is expected. The goal is to speed
> things up at the expense of more power consumed to achieve the same work,
> hence at a lower efficiency (== higher cost).

No, no Kconfig knobs.

> That's the main reason why this boosting apply a margin on the cost of the
> selected OPP rather than just inflating the util. This allows controlling
> directly how much more power (battery life) we are going to spend to achieve
> some work that we know could be achieved with less power.

But you're not; the margin is relative to the OPP, it is not absolute.

Or rather, the only actual limit is in relation to the max OPP. So you
have very little actual control over how much more energy you're
spending.

> > So while I agree that 2) is a reasonable signal to work from, everything
> > that comes after is still much confusing me.

> "When applying these boosting rules on the runqueue util signals ...":
> Assuming the set of enqueued tasks stays the same between 2 observations
> from schedutil, if we see the rq util_avg increase above its
> util_est.enqueued, that means that at least one task had its util_avg go
> above util_est.enqueued. We might miss some boosting opportunities if some
> (util - util_est) compensates:
> TASK_1(util - util_est) = - TASK_2(util - util_est)
> but working on the aggregated value is much easier in schedutil, to avoid
> crawling the list of entities.

That still does not explain why 'util - util_est', when >0, makes for a
sensible input into an OPP relative function.

I agree that 'util - util_est', when >0, indicates utilization is
increasing (for the aperiodic blah blah blah). But after that I'm still
confused.