Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

From: Juri Lelli
Date: Thu Mar 16 2017 - 13:17:14 EST


On 16/03/17 09:58, Joel Fernandes wrote:
> On Thu, Mar 16, 2017 at 5:44 AM, Juri Lelli <juri.lelli@xxxxxxx> wrote:
> > On 16/03/17 12:27, Patrick Bellasi wrote:
> >> On 16-Mar 11:16, Juri Lelli wrote:
> >> > On 15/03/17 16:40, Joel Fernandes wrote:
> >> > > On Wed, Mar 15, 2017 at 9:24 AM, Juri Lelli <juri.lelli@xxxxxxx> wrote:
> >> > > [..]
> >> > > >
> >> > > >> > However, trying to quickly summarize how that would work (for who is
> >> > > >> > already somewhat familiar with reclaiming bits):
> >> > > >> >
> >> > > >> > - a task utilization contribution is accounted for (at rq level) as
> >> > > >> > soon as it wakes up for the first time in a new period
> >> > > >> > - its contribution is then removed after the 0lag time (or when the
> >> > > >> > task gets throttled)
> >> > > >> > - frequency transitions are triggered accordingly
> >> > > >> >
> >> > > >> > So, I don't see why triggering a go down request after the 0lag time
> >> > > >> > expired and quickly reacting to tasks waking up would have create
> >> > > >> > problems in your case?
> >> > > >>
> >> > > >> In my experience, the 'reacting to tasks' bit doesn't work very well.
> >> > > >
> >> > > > Humm.. but in this case we won't be 'reacting', we will be
> >> > > > 'anticipating' tasks' needs, right?
> >> > >
> >> > > Are you saying we will start ramping frequency before the next
> >> > > activation so that we're ready for it?
> >> > >
> >> >
> >> > I'm saying that there is no need to ramp, simply select the frequency
> >> > that is needed for a task (or a set of them).
> >> >
> >> > > If not, it sounds like it will only make the frequency request on the
> >> > > next activation when the Active bandwidth increases due to the task
> >> > > waking up. By then task has already started to run, right?
> >> > >
> >> >
> >> > When the task is enqueued back we select the frequency considering its
> >> > bandwidth request (and the bandwidth/utilization of the others). So,
> >> > when it actually starts running it will already have enough capacity to
> >> > finish in time.
> >>
> >> Here we are factoring out the time required to actually switch to the
> >> required OPP. I think Joel was referring to this time.
> >>
>
> Yes, that's what I meant.
>
> >
> > Right. But, this is an HW limitation. It seems a problem that every
> > scheduler driven decision will have to take into account. So, doesn't
> > make more sense to let the driver (or the governor shim layer) introduce
> > some sort of hysteresis to frequency changes if needed?
>
> The problem IMO which Hysterisis in the governor will not help is what
> if you had a DL task that is not waking up for several periods and
> then wakes up, then for that wake up, we would still be subject to the
> HW limitation of time taken to switch to needed OPP. Right?
>

True, but in this case the problem is that you cannot really predict the
future anyway. So, if your HW is so slow to react that it always causes
latency problems then I guess you'll be forced to statically raise your
min_freq value to cope with that HW limitation, indipendently from
scheduling policies/heuristics?

OTOH, hysteresis, when properly tuned, should cover the 'normal' cases.

> >> That time cannot really be eliminated but from having faster OOP
> >> swiching HW support. Still, jumping strating to the "optimal" OPP
> >> instead of rumping up is a big improvement.
>
> Yes I think so.
>
> Thanks,
> Joel