Re: [PATCH RESEND RESEND] thermal/of: support thermal zones w/o trips subnode

From: Icenowy Zheng
Date: Sun Jul 23 2023 - 22:36:03 EST


在 2023-07-23星期日的 16:05 +0100,Mark Brown写道:
> On Sun, Jul 23, 2023 at 12:12:49PM +0200, Daniel Lezcano wrote:
> > On 22/07/2023 22:11, Mark Brown wrote:
>
> > > This makes sense to me - it allows people to see the reported
> > > temperature even if there's no trips defined which seems more
> > > helpful than refusing to register.
>
> ...
>
> > If the goal is to report the temperature only, then hwmon should be
> > used
> > instead.
>
> Sure, that doesn't seem to be the case in the impacted systems though
> -
> AFAICT the issue with these is that it's a generic SoC DT that's not
> fully fleshed out, either because more data is needed for the silicon
> or
> because the numbers need to be system specific for some reason.

Well maybe we should move all thermal sensors to hwmon framework, then
let thermal framework pull the readout from hwmon; but two frameworks
have the same functionality of reading temperature is the current
situation, we shouldn't break things.

>
> > If the goal is to mitigate by userspace, then the trip point *must*
> > be used
> > to prevent the userspace polling the temperature. With the trip
> > point the
> > sensor will be set to fire an interrupt at the given trip
> > temperature.
>
> I'm not clear a trip point prevent userspace polling if it feels so
> moved?  Is it just that it makes it more likely that someone will
> implement something that polls?
>
> > IOW, trip points are not optional

If it's declared optional in DT binding in a released kernel version,
then it's optional, at least it should be optional in practice to
support this legacy DT binding, and even there are DT files shipped
with the kernel that utilizes the optionalness. Showing a warning is
okay, but bailing out is not an option, according to my understand of
current DT maintaince model.

>
> I can see printing a loud warning given that the system is not fully
> configured (there's a warning already, I did nearly comment on this
> patch downgrading it all the way to a debug log), perhaps even
> suppressing the registraton of the userspace interface, but returning
> a
> failure to the registering driver feels like it's escalating the
> problem
> and complicating the driver code.  Suppressing the registration to
> userspace seemed like it was adding more complexity in the core but
> it
> would avoid any potential confusion for userspace.
>
> For me the main issue is the impact on devices that support multiple
> thermal zones, in order to avoid having working zones stay registered
> their drivers will all have to handle the possibility of some of the
> zones failing to register due to missing configuration which is going
> to

Well I think in the case of Allwinner SoCs, the thermal sensor is a
multi-channel one, so it's possible that some channels (e.g. the CPU
sensor) are used for thermal throttling and other channels (e.g. the
GPU one, considering Mali-400 is quite weak, and usually no DVFS
equipped) are only used for monitoring.

We should allow this kind of configuration in kernel. Moving everything
to hwmon is an option, but it's a too gaint change.

> add complexity both at both registration and runtime and be easy to
> miss.
> If the core just accepts the zones then whatever complexity there is
> gets factored out into the core.