Re: [PATCH v2 1/2] sched/fair: Take thermal pressure into account while estimating energy

From: Vincent Guittot
Date: Thu Jun 10 2021 - 08:40:48 EST


On Thu, 10 Jun 2021 at 14:30, Lukasz Luba <lukasz.luba@xxxxxxx> wrote:
>
>
>
> On 6/10/21 1:19 PM, Vincent Guittot wrote:
> > On Thu, 10 Jun 2021 at 12:37, Lukasz Luba <lukasz.luba@xxxxxxx> wrote:
> >>
> >>
> >>
> >> On 6/10/21 11:07 AM, Dietmar Eggemann wrote:
> >>> On 10/06/2021 11:04, Lukasz Luba wrote:
> >>>>
> >>
> >> [snip]
> >>
> >>>> Not always, it depends on thermal governor decision, workload and
> >>>> 'power actors' (in IPA naming convention). Then it depends when and how
> >>>> hard you clamp the CPUs. They (CPUs) don't have to be always
> >>>> overutilized, they might be even 50-70% utilized but the GPU reduced
> >>>> power budget by 2 Watts, so CPUs left with only 1W. Which is still OK
> >>>> for the CPUs, since they are only 'feeding' the GPU with new 'jobs'.
> >>>
> >>> All this pretty much confines the usefulness of you proposed change. A
> >>> precise description of it with the patches is necessary to allow people
> >>> to start from there while exploring your patches.
> >>
> >> OK, I see your point.
> >>
> >> [snip]
> >>
> >>>> True, I hope this description above would help to understand the
> >>>> scenario.
> >>>
> >>> This description belongs in the patch header. The scenario in which your
> >>> functionality would improve things has to be clear.
> >>> I'm sure that not everybody looking at this patches is immediately aware
> >>> on how IPA setups work and which specific setup you have in mind here.
> >>
> >> Agree. I will add this description into the patch header for v3.
> >>
> >> [snip]
> >>
> >>>>
> >>>> Yes, this code implementation tries to address those issues.
> >>>
> >>> The point I was making here is: why using the PELT signal
> >>> thermal_load_avg() and not per_cpu(thermal_pressure, cpu) directly,
> >>> given the fact that the latter perfectly represents the frequency clamping?
> >>>
> >>
> >> Good question. I wanted to be aligned with other parts in the fair.c
> >> like cpu_capacity() and all it's users. The CPU capacity is reduced by
> >> RT, DL, IRQ and thermal load avg, not the 'raw' value from the
> >> per-cpu variable.
> >>
> >> TBH I cannot recall what was the argument back then
> >> when thermal pressure geometric series was introduced.
> >> Maybe to have a better control how fast it raises and decays
> >> so other mechanisms in the scheduler will see the change in thermal
> >> as not so sharp... (?)
> >>
> >>
> >> Vincent do you remember the motivation to have geometric series
> >> in thermal pressure and not use just the 'raw' value from per-cpu?
> >
> > In order to have thermal pressure synced with other metrics used by
> > the scheduler like util/rt/dl_avg. As an example, when thermal
> > pressure will decrease because thermal capping is removed, the
> > utilization will increase at the same pace as thermal will decrease
> > and it will not create some fake spare cycle. util_avg is the average
> > expected utilization of the cpu, thermal pressure reflects the average
> > stolen capacity for the same averaging time scale but this can be the
> > result of a toggle between several OPP
> >
> > Using current capping is quite volatile to make a decision as it might
> > have changed by the time you apply your decision.
> >
>
> So for this scenario, where we want to just align EAS with SchedUtil
> frequency decision, which is instantaneous and has 'raw' value
> of capping from policy->max, shouldn't we use:
>
> thermal_pressure = arch_scale_thermal_pressure(cpu_id)

Yes you should probably use arch_scale_thermal_pressure(cpu) instead
of thermal_load_avg(rq) in this case



>
> instead of geometric series thermal_pressure signal?
>
> This would avoid the hassling with idle CPUs and not updated
> thermal signal.