Re: [PATCH v5 1/2] dt-bindings: cpufreq: add virtual cpufreq device

From: Quentin Perret
Date: Mon Mar 11 2024 - 07:40:57 EST


On Sunday 04 Feb 2024 at 10:23:00 (+0000), Marc Zyngier wrote:
> Well, I've said it before, and I'll say it again: the use of
> *frequencies* makes no sense. It is a lie (it doesn't describe any
> hardware, physical nor virtual), and doesn't reflect the way the
> emulated cpufreq controller behaves either (since it scales everything
> back to what the host can potentially do)
>
> The closest abstraction we have to this is the unit-less capacity. And
> *that* reflects the way the emulated cpufreq controller works while
> avoiding lying to the guest about some arbitrary frequency.
>
> In practice, this changes nothing to either the code or the behaviour.
> But it changes the binding.

Apologies all for jumping late into this, but for what it's worth,
regardless of the unit of the binding, Linux will shove that into
cpufreq's 'frequency table' anyway, which as the name suggests is very
much assuming frequencies :/ -- see how struct cpufreq_frequency_table
explicitely requires KHz. The worst part is that this even ends up
being reported to _userspace_ as frequencies in sysfs via cpufreq's
scaling_available_frequencies file, even when they're really not...

In the case of SCMI for example, IIRC the firmware can optionally (and
in practice I think it does for all older implementations of the spec
least) report unit-less operating points to the driver, which will then
happily pretend these are KHz values when reporting that into PM_OPP and
cpufreq -- see how scmi_dvfs_device_opps_add() simply multiplies the
level's 'perf' member by 1000 when populating PM_OPP (which is then
propagated to cpufreq's freq_table'). And a small extract from the SCMI
spec:

"Certain platforms use IMPLEMENTATION DEFINED indices to identify
performance levels. Level Indexing Mode is used to describe such
platform behavior. The level indices associated with performance
levels are neither guaranteed to be contiguous nor required to be
on a linear scale."

Not nice, but unfortunately the core cpufreq framework has way too much
historical dependencies on things being frequencies to really change it
now, so we're pretty much stuck with that :(

So, while I do agree with the sentiment that this is a non-ideal place
to be, 'faking' frequencies is how we've addressed this so far in Linux,
so I'm personally not too fussed about David's usage of a freq-based DT
binding in this particular instance. On the plus side that allows to
re-use all of PM_OPP and cpufreq infrastructure as-is, so that's cool.

I guess we could make the argument that Linux's approach to handling
frequencies shouldn't influence this given that the binding should be OS
agnostic, but I can easily see how another OS could still make use of
that binding (and in fact requiring that this other OS can deal with
unitless frequencies is most likely going to be a bigger problem), so
I'd be inclined to think this isn't a major problem either.

Thanks,
Quentin