Re: [PATCH V2 0/3] Introduce Thermal Pressure

From: Ingo Molnar
Date: Wed Apr 17 2019 - 14:29:40 EST



* Thara Gopinath <thara.gopinath@xxxxxxxxxx> wrote:

>
> On 04/17/2019 01:36 AM, Ingo Molnar wrote:
> >
> > * Thara Gopinath <thara.gopinath@xxxxxxxxxx> wrote:
> >
> >> The test results below shows 3-5% improvement in performance when
> >> using the third solution compared to the default system today where
> >> scheduler is unware of cpu capacity limitations due to thermal events.
> >
> > The numbers look very promising!
>
> Hello Ingo,
> Thank you for the review.
> >
> > I've rearranged the results to make the performance properties of the
> > various approaches and parameters easier to see:
> >
> > (seconds, lower is better)
> >
> > Hackbench Aobench Dhrystone
> > ========= ======= =========
> > Vanilla kernel (No Thermal Pressure) 10.21 141.58 1.14
> > Instantaneous thermal pressure 10.16 141.63 1.15
> > Thermal Pressure Averaging:
> > - PELT fmwk 9.88 134.48 1.19
> > - non-PELT Algo. Decay : 500 ms 9.94 133.62 1.09
> > - non-PELT Algo. Decay : 250 ms 7.52 137.22 1.012
> > - non-PELT Algo. Decay : 125 ms 9.87 137.55 1.12
> >
> >
> > Firstly, a couple of questions about the numbers:
> >
> > 1)
> >
> > Is the 1.012 result for "non-PELT 250 msecs Dhrystone" really 1.012?
> > You reported it as:
> >
> > non-PELT Algo. Decay : 250 ms 1.012 7.02%
>
> It is indeed 1.012. So, I ran the "non-PELT Algo 250 ms" benchmarks
> multiple time because of the anomalies noticed. 1.012 is a formatting
> error on my part when I copy pasted the results into a google sheet I am
> maintaining to capture the test results. Sorry about the confusion.

That's actually pretty good, because it suggests a 35% and 15%
improvement over the vanilla kernel - which is very good for such
CPU-bound workloads.

Not that 5% is bad in itself - but 15% is better ;-)

> Regarding the decay period, I agree that more testing can be done. I
> like your suggestions below and I am going to try implementing them
> sometime next week. Once I have some solid results, I will send them
> out.

Thanks!

> My concern regarding getting hung up too much on decay period is that I
> think it could vary from SoC to SoC depending on the type and number of
> cores and thermal characteristics. So I was thinking eventually the
> decay period should be configurable via a config option or by any other
> means. Testing on different systems will definitely help and maybe I am
> wrong and there is no much variation between systems.

Absolutely, so I'd not be against keeping it a SCHED_DEBUG tunable or so,
until there's a better understanding of how the physical properties of
the SoC map to an ideal decay period.

Assuming PeterZ & Rafael & Quentin doesn't hate the whole thermal load
tracking approach. I suppose there's some connection of this to Energy
Aware Scheduling? Or not ...

Thanks,

Ingo