Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework

From: Daniel Lezcano
Date: Mon Oct 12 2020 - 12:03:06 EST


On 12/10/2020 13:46, Hans de Goede wrote:
> Hi Daniel,
>
> On 10/12/20 12:30 PM, Daniel Lezcano wrote:
>>
>> Hi Hans,
>>
>> On 07/10/2020 12:43, Hans de Goede wrote:
>>> Hi,
>>>
>>> On 10/6/20 2:20 PM, Daniel Lezcano wrote:
>>>> The density of components greatly increased the last decade bringing a
>>>> numerous number of heating sources which are monitored by more than 20
>>>> sensors on recent SoC. The skin temperature, which is the case
>>>> temperature of the device, must stay below approximately 45°C in order
>>>> to comply with the legal requirements.
>>>>
>>>> The skin temperature is managed as a whole by an user space daemon,
>>>> which is catching the current application profile, to allocate a power
>>>> budget to the different components where the resulting heating effect
>>>> will comply with the skin temperature constraint.
>>>>
>>>> This technique is called the Dynamic Thermal Power Management.
>>>>
>>>> The Linux kernel does not provide any unified interface to act on the
>>>> power of the different devices. Currently, the thermal framework is
>>>> changed to export artificially the performance states of different
>>>> devices via the cooling device software component with opaque values.
>>>> This change is done regardless of the in-kernel logic to mitigate the
>>>> temperature. The user space daemon uses all the available knobs to act
>>>> on the power limit and those differ from one platform to another.
>>>>
>>>> This series provides a Dynamic Thermal Power Management framework to
>>>> provide an unified way to act on the power of the devices.
>>>
>>> Interesting, we have a discussion going on about a related
>>> (while at the same time almost orthogonal) discussion for
>>> setting policies for if the code managing the restraints
>>> (which on x86 is often hidden in firmware or ACPI DPTF tables)
>>> should have a bias towards trying to have as long a battery life
>>> as possible, vs maximum performance. I know those 2 aren't
>>> always opposite ends of a spectrum with race-to-idle, yet most
>>> modern x86 hardware has some notion of what I call performance-profiles
>>> where we can tell the firmware managing this to go for a bias towards
>>> low-power / balanced / performance.
>>>
>>> I've send a RFC / sysfs API proposal for this here:
>>> https://lore.kernel.org/linux-pm/20201003131938.9426-1-hdegoede@xxxxxxxxxx/
>>>
>>>
>>> I've read the patches in this thread and as said already I think
>>> the 2 APIs are mostly orthogonal. The API in this thread is giving
>>> userspace direct access to detailed power-limits allowing userspace
>>> to configure things directly (and for things to work optimal userspace
>>> must do this). Where as in the x86 case with which I'm dealing
>>> everything
>>> is mostly handled in a black-box and userspace can merely configure
>>> the low-power / balanced / performance bias (*) of that black-box.
>>>
>>> Still I think it is good if we are aware of each-others efforts here.
>>>
>>> So Daniel, if you can take a quick look at my proposal:
>>> https://lore.kernel.org/linux-pm/20201003131938.9426-1-hdegoede@xxxxxxxxxx/
>>>
>>>
>>> That would be great. I think we definitely want to avoid having 2
>>> APIs for the same thing here. Again I don't think that is actually
>>> the case, but maybe you see this differently ?
>>
>> Thanks for pointing this out. Actually, it is a different feature as you
>> mentioned. The profile is the same knob we have with the BIOS where we
>> can choose power/ balanced power / balanced/balanced
>> performance / performance, AFAICT.
>
> Right.
>
>> Here the proposed interface is already exported in userspace via the
>> powercap framework which supports today the backend driver for the RAPL
>> register.
>
> You say that some sort of power/ balanced power / balanced /
> balanced performance / performance setting in is already exported
> through the powercap interface today (if I understand you correctly)?

Sorry, I was unclear. I meant 'Here the proposed interface' referring to
the powercap/dtpm. There is no profile interface in the powercap framework.

> But I'm not seeing any such setting in:
> Documentation/ABI/testing/sysfs-class-powercap
>
> Nor can I find it under /sys/class/powercap/intel-rapl* on a ThinkPad
> X1 carbon 8th gen.
>
> Note, if there indeed is an existing userspace API for this I would
> greatly prefer for the thinkpad_acpi and hp-wmi (and possibly other)
> drivers to use this, so if you can point me to this interface then
> that would be great.
>
>> The userspace will be in charge of handling the logic to have the
>> correct power/performance profile tuned against the current application
>> running foreground. The DTPM framework gives the unified access to the
>> power limitation to the individual devices the userspace logic can act
>> on.
>>
>> A side note, related to your proposal, not this patch. IMO it suits
>> better to have /sys/power/profile.
>>
>> cat /sys/power/profile
>>
>> power
>> balanced_power *
>> balanced
>> balanced_performance
>> performance
>>
>> The (*) being the active profile.
>
> Interesting the same thing was brought up in the discussion surrounding
> RFC which I posted.
>
> The downside against this approach is that it assumes that there
> only is a single system-wide settings. AFAIK that is not always
> the case, e.g. (AFAIK):
>
> 1. The intel pstate driver has something like this
>    (might this be the rapl setting you mean? )
>
> 2. The X1C8 has such a setting for the embedded-controller, controlled
>    through the ACPI interfaces which thinkpad-acpi used
>
> 3. The hp-wmi interface allows selecting a profile which in turn
>    (through AML code) sets a bunch of variables which influence how
>    the (dynamic, through mjg59's patches) DPTF code controls various
>    things
>
> At least the pstate setting and the vendor specific settings can
> co-exist. Also the powercap API has a notion of zones, I can see the
> same thing here, with a desktop e.g. having separate performance-profile
> selection for the CPU and a discrete GPU.
>
> So limiting the API to a single /sys/power/profile setting seems a
> bit limited and I have the feeling we will regret making this
> choice in the future.
>
> With that said your proposal would work well for the current
> thinkpad_acpi / hp-wmi cases, so I'm not 100% against it.
>
> This would require adding some internal API to the code which
> owns the /sys/power root-dir to allow registering a profile
> provider I guess. But that would also immediately bring the
> question, what if multiple drivers try to register themselves
> as /sys/power/profile provider ?

Did you consider putting the profile on a per device basis ?

eg.

/sys/devices/system/cpu/cpu[0-9]/power/profile

May be make 'energy_performance_preference' obsolete in
/sys/devices/system/cpu/cpufreq ?

When one device sets the profile, all children will have the same profile.

eg.

A change in /sys/devices/system/cpu/power/profile will impact all the
underlying cpu[0-9]/power/profile

Or a change in /sys/devices/power/profile will change all profiles below
/sys/devices.

Well that is a high level suggestion, I don't know how that can fit with
the cyclic sysfs hierarchy.







--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog