Re: [PATCH v1 2/2] PM: QoS: Add a performance QoS

From: Daniel Lezcano
Date: Tue Dec 19 2023 - 07:33:39 EST

Next message: Rafał Miłecki: "Re: [PATCH v6.8 2/2] nvmem: drop nvmem_layout_get_match_data()"
Previous message: Michael Ellerman: "Re: [PATCH] powerpc/64s: Increase default stack size to 32KB"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Caleb,

[Cc'ed Viresh]

On 13/12/2023 19:35, Caleb Connolly wrote:

Hi Daniel,

On 13/12/2023 17:58, Daniel Lezcano wrote:

Currently cpufreq and devfreq are using the freq QoS to aggregate the
requests for frequency ranges.

However, there are new devices wanting to act not on a frequency range
but on a performance index range. Those need also to export to
userspace the knob to act on their performance limits.

This change provides a performance limiter QoS based on a minimum /
maximum performance values. At init time, the limits of the interval
are 0 / 1024. It is up to the backend to convert the 1024 to the
maximum performance state. So if the performance must be limited to
50%, it should set to maximum limit to 512 where the backend will end
up by converting (max performance index / 2). The same applies for the
minimum. Obviously, the min can not be greater than the max.

I really feel like it should be possible to have arbitrary min/max
performance values. As is the case with latency and frequency.

We had an initial discussion about the performance QoS some weeks ago. Rafael is reluctant to have arbitrary values. So it was proposed a 1024 based approach and let the back end to convert the value to its index.

If we go for a similar approach to the frequencies, then we should have more files to describe the different states. At least one defining the current state, the min and the max.

1. With the example above, if there is a odd number like 5 for the
number of performance indexes and we ask for 512 (so 50%), what would
be the performance index computed? (5/2=2 or 5/2=3)? (I would say the
minimum otherwise we end up with a performance limit greater than
what we actually asked for).

For a device with just a handful of performance indices this is quite a
large margin for error. If there are just 3 for example, and some
algorithm is decreasing the performance level over time (e.g. due to
some thermal condition), the algorithm cannot determine at what point
the devices performance level has actually changed, making debugging and
tuning of behaviour needlessly difficult.

Yes, it is a valid point. May be we can find an intermediate approach.

If we define an additional information, let's call it "granularity" for example and keep the 0-1023, then the userspace can rely on this information to build the steps.

If we take your example with a 3 performance states device, then the granularity would be:

1024 / 3 = 341.3

As floating does not exist in the kernel, then it would be 342.

State 0 = 0 x 342 = 0
State 1 = 1 x 342 = 342
State 2 = 2 x 342 = 684
State 3 = 3 x 342 = 1026 (floored to 1024)

So we end up with a fixed range, a way to quickly escalate the stairs and three files in the device's power sysfs entry.

This also leaves it up to the backend driver to decide if it should
round up or down, something that should definitely be handled by the
framework.

Maybe I missed some previous discussion, but isn't this what
operating-points is designed for?

It has an `opp-level` property, but that is meant to be device-specific.
With the `opp-hz` property being the "normalised" values that the
framework deals with.

We would just want some way to defined an `opp-level` as a percentage
(or whatever), with an arbitrary `opp-performance-index` being the
device-specific property.

This also gracefully handles non-linear performance scaling.

I think it is a different subject, we are talking about how to describe the hardware and these performance states. But I agree, it is worth to keep the opp description in mind.

[ ... ]

--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Next message: Rafał Miłecki: "Re: [PATCH v6.8 2/2] nvmem: drop nvmem_layout_get_match_data()"
Previous message: Michael Ellerman: "Re: [PATCH] powerpc/64s: Increase default stack size to 32KB"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]