Re: [PATCH 2/4] sched: cpufreq: Fix apply_dvfs_headroom() escaping uclamp constraints

From: Vincent Guittot
Date: Sun Sep 24 2023 - 03:58:18 EST


On Sat, 16 Sept 2023 at 21:25, Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
>
> On 09/12/23 18:03, Vincent Guittot wrote:
>
> > And it seems that what is done today doesn't work correctly for you.
> > Your proposal to include cpufreq headroom into the scheduler is not
> > correct IMO and it only applies for some cases. Then, the cpufreq
> > driver can have some really good reason to apply some headroom even
> > with an uclamp value but it can't make any decision.
> >
> > I think that we should use another way to fix your problem with how
> > uclamp than reordering how headroom is applied by cpufreq. Mixing
> > utilization and performance in one signal hide some differences that
> > cpufreq can make a good use of.
> >
> > As an example:
> >
> > cfs util = 650
> > cfs uclamp = 800
> > irq = 128
> >
> > cfs with headroom 650*1.25=812 is clamped to 800
> >
> > Final utilization will be : 800(1024-128)/1024+128*1.25=860 which is
> > above the target of 800.
> >
> > When we look at the detail, we have:
> >
> > cfs util once scaled to the full range is only 650(1024-128)/1024= 568
> >
> > After applying irq (even with some headroom) 568+128*1.25 = 728 which
> > is below the uclamp of 800 so shouldn't we stay at 800 in this case ?
>
> Shouldn't it be (568+128)*1.25 = 870? Which is almost the 860 above. We calmped
> the 812 to 800, with rounding errors that almost accounts for the 10 points
> difference between 870 and 860..

no I voluntarily use 568 + 128*1.25. I added dvfs headroom for irq
just to ensure that you will not raise that I removed the headroom for
irq and focus on the use case but it might have created more
confusion.

My example above demonstrate that only taking care of cases with null
irq pressure is not enough and you can still ends up above 800

IIUC you point with uclamp_max. It is a performance limit that you
don't want to cross because of CFS.This means that we should not go
above 800 in my example because of cfs utilization: Irq needs between
128 and CFS asks 568 so the system needs 696 which is below the 800
uclamp. Even if you add the dvfs headroom on irq, the system is still
below 800. Only when you add dfvs headroom to cfs then you go above
800 but it's not needed because uclamp say that you should not go
above 800 because of CFS so we should stay at 800 whereas both current
formula and your new formula return a value above 800

>
> I might have gotten the math wrong. But what I saw is that we have
>
> util = (X + Y + Z) * A
>
> and what I did
>
> util = AX + AY + AZ
>
> so maybe I missed something up, but I just did the multiplication with the
> headroom to each element individually rather than after the sum.
>
> So yeah, if I messed that part up, then that wasn't intentional and should be
> done differently. But I still can't see it.
>
> > >
> > > The main change being done here actually is to apply_dvfs_headroom() *before*
> > > doing uclamp_rq_util_with(). I am not sure how you see this mixing.
> >
> > Because dvfs_headroom is a cpufreq hints and you want to apply it
> > somewhere else.
>
> I am still not sure if you mean we are mixing up the code and we need better
> abstraction or something else.
>
> Beside the abstraction problem, which I agree with, I can't see what I am
> mixing up yet :( Sorry I think I need more helping hand to see it.

There is a mix between actual utilization and performance limit and
when we add both we then lose important information as highlighted by
my example. If the current formula is not correct because we can go
above uclamp_max value, your proposal is not better. And the root
cause is mainly coming from adding utilization with performance limit
(i.e. uclamp)

That's why I said that we need a new interface to enable cpufreq to
not blindly apply its headroom but to make smarter decision at cpufreq
level


>
> > > Current code performs apply_dvfs_headroom() *after*; which what causes the CPU
> > > to run at a performance level higher than rq->uclamp[UCLAMP_MAX].
> > >
> > > It doesn't matter how many tasks on the rq, if rq->uclamp[UCLAMP_MAX] is set to
> > > 800, then the CPU should not vote to max (assuminig all other pressures are 0).
> >
> > You can't remove the irq pressure from the picture. If
> > rq->uclamp[UCLAMP_MAX] is set to 800 means that cpu must not go above
> > 800, it should apply also after taking into account other inputs. At
> > least up to some level as described in my example above
>
> I was trying to simplify to understand what you mean as I don't think I see the
> problem you're highlighting still.
>
>
> Thanks!
>
> --
> Qais Yousef