Re: [PATCH 6/7] thermal: netlink: Add a new event to notify CPU capabilities change

From: Lukasz Luba
Date: Tue Nov 09 2021 - 12:51:11 EST




On 11/9/21 2:15 PM, Srinivas Pandruvada wrote:
On Tue, 2021-11-09 at 13:53 +0000, Lukasz Luba wrote:
Hi Srinivas,

On 11/9/21 1:23 PM, Srinivas Pandruvada wrote:
Hi Lukasz,

On Tue, 2021-11-09 at 12:39 +0000, Lukasz Luba wrote:
Hi Ricardo,


On 11/6/21 1:33 AM, Ricardo Neri wrote:
From: Srinivas Pandruvada <srinivas.pandruvada@xxxxxxxxxxxxxxx>

Add a new netlink event to notify change in CPU capabilities in
terms of
performance and efficiency.

Is this going to be handled by some 'generic' tools? If yes,
maybe
the values for 'performance' might be aligned with capacity
[0,1024] ? Or are they completely not related so the mapping is
simply impossible?


That would have been very useful.

The problem is that we may not know the maximum performance as
system
may be booting with few CPUs (using maxcpus kernel command line)
and
then user hot adding them. So we may need to rescale when we get a
new
maximum performance CPU and send to user space.

We can't just use max from HFI table at in instance as it is not
necessary that HFI table contains data for all CPUs.

If HFI max performance value of 255 is a scaled value to max
performance CPU value in the system, then this conversion would
have
been easy. But that is not.

I see. I was asking because I'm working on similar interface and
just wanted to understand your approach better. In my case we
would probably simply use 'capacity' scale, or more
precisely available capacity after subtracting 'thermal pressure'
value.
That might confuse a generic tool which listens to these socket
messages, though. So probably I would have to add a new
THERMAL_GENL_ATTR_CPU_CAPABILITY_* id
to handle this different normalized across CPUs scale.
I can add a field capacity_scale. In HFI case it will always be 255. In
your cases it will 1024.



Sounds good, with that upper limit those tools would not build
up assumptions (they would have to parse that scale value).
Although, I would prefer to call it 'performance_scale' if you don't
mind.
I've done similar renaming s/capacity/performance/ in the Energy Model
(EM) some time ago [1]. Some reasons:
- in the scheduler we have 'Performance Domains (PDs)'
- for GPUs we talk about 'performance', because 'capacity' sounds odd
in that case

[1] https://lore.kernel.org/linux-pm/20200527095854.21714-2-lukasz.luba@xxxxxxx/