Re: cpufreq: intel_pstate: map utilization into the pstate range

From: Francisco Jerez
Date: Tue Jan 04 2022 - 19:38:46 EST


Julia Lawall <julia.lawall@xxxxxxxx> writes:

> On Mon, 3 Jan 2022, Rafael J. Wysocki wrote:
>
>> On Mon, Jan 3, 2022 at 7:23 PM Julia Lawall <julia.lawall@xxxxxxxx> wrote:
>> >
>> > > > > Can you please run the 32 spinning threads workload (ie. on one
>> > > > > package) and with P-state locked to 10 and then to 20 under turbostat
>> > > > > and send me the turbostat output for both runs?
>> > > >
>> > > > Attached.
>> > > >
>> > > > Pstate 10: spin_minmax_10_dahu-9_5.15.0freq_schedutil_11.turbo
>> > > > Pstate 20: spin_minmax_20_dahu-9_5.15.0freq_schedutil_11.turbo
>> > >
>> > > Well, in both cases there is only 1 CPU running and it is running at
>> > > 1 GHz (ie. P-state 10) all the time as far as I can say.
>> >
>> > It looks better now. I included 1 core (core 0) for pstates 10, 20, and
>> > 21, and 32 cores (socket 0) for the same pstates.
>>
>> OK, so let's first consider the runs where 32 cores (entire socket 0)
>> are doing the work.
>>
>> This set of data clearly shows that running the busy cores at 1 GHz
>> takes less energy than running them at 2 GHz (the ratio of these
>> numbers is roughly 2/3 if I got that right). This means that P-state
>> 10 is more energy efficient than P-state 20, as expected.

Uhm, that's not what I'm seeing Rafael.

>
> Here all the threads always spin for 10 seconds. But if they had a fixed
> amount of work to do, they should finish twice as fast at pstate 20.
> Currently, we have 708J at pstate 10 and 905J at pstate 20, but if we can
> divide the time at pstate 20 by 2, we should be around 450J, which is much
> less than 708J.
>

I agree with Julia on this: According to the last turbostat logs
attached to this thread, CPU package #0 consumes 618 J for 32 threads
spinning at 2GHz for 10s, and 421 J for the same number of threads
spinning at 1GHz for roughly the same time, therefore at P-state 10 we
observe an energy efficiency (based on Rafael's own definition of energy
efficiency elsewhere in this thread) of 1GHz*10s/421J = 24 Mclocks/J,
while at P-state 20 we observe an energy efficiency of 2GHz*10s/618J =
32 Mclocks/J, so P-state 20 is clearly the most energy-efficient in
Julia's setup, even if we only consider one of the CPU packages in her
system (considering the effect of the second CPU package would further
bias the result in favor of P-state 20).

Since her latest experiment is utilizing all 16 cores of the package
close to 100% of the time, I think this rules out our earlier theory of
this being the result of broken idle management, two alternative
explanations I can think of:

- Voltage scaling isn't functioning as expected, your CPU's reported
maximum efficiency ratio may be calculated based on the assumption
that your CPU would be running at a lower voltage around P-state 10,
which for some reason isn't the case in your system.

- MSR_PLATFORM_INFO is misreporting the maximum efficiency ratio as
suggested earlier.

> turbostat -J sleep 5 shows 105J, so we're still ahead.
>
> I haven't yet tried the actual experiment of spinning for 5 seconds and
> then sleeping for 5 seconds, though.
>
>>
>> However, the cost of running at 2.1 GHz is much greater than the cost
>> of running at 2 GHz and I'm still thinking that this is attributable
>> to some kind of voltage increase between P-state 20 and P-state 21
>> (which, interestingly enough, affects the second "idle" socket too).
>>
>> In the other set of data, where only 1 CPU is doing the work, P-state
>> 10 is still more energy-efficient than P-state 20,
>
> Actually, this doesn't seem to be the case. It's surely due to the
> approximation of the result, but the consumption is slightly lower for
> pstate 20. With more runs it probably averages out to around the same.
>

Yeah, I agree that the data seems to confirm P-state 20 being truly more
efficient than P-state 10, whether 1 or 16 cores are in use.

> julia
>
>> but it takes more
>> time to do the work at 1 GHz, so the energy lost due to leakage
>> increases too and it is "leaked" by all of the CPUs in the package
>> (including the idle ones in core C-states), so overall this loss
>> offsets the gain from using a more energy-efficient P-state. At the
>> same time, socket 1 can spend more time in PC2 when the busy CPU is
>> running at 2 GHz (which means less leakage in that socket), so with 1
>> CPU doing the work the total cost of running at 2 GHz is slightly
>> smaller than the total cost of running at 1 GHz. [Note how important
>> it is to take the other CPUs in the system into account in this case,
>> because there are simply enough of them to affect one-CPU measurements
>> in a significant way.]
>>
>> Still, when going from 2 GHz to 2.1 GHz, the voltage jump causes the
>> energy to increase significantly again.
>>