Re: thermal/drivers/tegra: Getting rid of the get_thermal_instance() usage

From: Thierry Reding
Date: Tue Feb 07 2023 - 07:18:38 EST


On Mon, Feb 06, 2023 at 03:50:22PM +0100, Daniel Lezcano wrote:
>
> Hi Thierry,
>
> did you have the time to look at the get_thermal_instance() removal ?
>
>
> On 26/01/2023 13:55, Thierry Reding wrote:
>
> > [ 12.354091] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
> > [ 12.379009] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
> > [ 12.388882] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
> > [ 12.401007] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
> > [ 12.471041] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
> > [ 12.482852] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
> > [ 12.482860] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
> > [ 12.485357] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
> > [ 12.501774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
> >
> > and after these changes, it turns into:
> >
> > [ 12.447113] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
> > [ 12.472300] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
> > [ 12.481789] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
> > [ 12.495447] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
> > [ 12.496514] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
> > [ 12.510353] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
> > [ 12.526856] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
> > [ 12.528774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
> > [ 12.569352] tegra_soctherm 700e2000.thermal-sensor: programming throttle for pll to 103000
> > [ 12.577635] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when pll reaches 103000 mC
> > [ 12.590952] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
> > [ 12.600783] tegra_soctherm 700e2000.thermal-sensor: programming throttle for mem to 103000
> > [ 12.609204] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when mem reaches 103000 mC
> >
> > The "programming throttle ..." messages are something I've added locally
> > to trace what gets called. So it looks like for "pll" and "mem" thermal
> > zones, we now program trip points whereas we previously didn't.

My diagnosis above wasn't entirely correct. We're not actually skipping
trip point programming for PLL and MEM thermal zones in the current
code. Instead, we skip throttle programming. As far as I can tell this
is a mechanism built into ACTMON to allow it to automatically throttle
when a zone reaches a certain temperature.

This is modelled as a cooling device, but internally it's actually done
automatically, which is why we have this code that programs the throttle
at driver probe time, rather than the on-demand programming that typical
cooling device would do (such as a fan).

The reason why we have get_thermal_instance() here is to check if this
built-in cooling device has been configured for the "hot" trip point. If
not, we don't want the throttle programming to happen. This adds the
added flexibility of explicitly disabling the automatic throttling by
ACTMON and using another cooling device (or none at all) if that's what
is needed.

Dropping just the call to get_thermal_instance() and relying on the
find_throttle_cfg_by_name() function will always return a valid throttle
configuration. This is slightly obfuscated because of this:

cdev = ts->throt_cfgs[i].cdev;
if (get_thermal_instance(tz, cdev, trip_id))
stc = find_throttle_cfg_by_name(ts, cdev->type);

As far as I can tell this will always return &ts->throt_cfgs[i], so the
find_throttle_cfg_by_name() call is a bit redundant here. I'll look into
fixing that.

In any case, the important thing is that it would always find a valid
throttle configuration and therefore program the throttle, even if we
may not want to.

Possibly we could work around that by removing this fiddly special case
and instead add a new callback for the cooling devices that can be run
when they are bound to a thermal zone. This would allow the throttle
programming to be initiated from within the thermal core rather than
"bolted on" like it is now and should allow us to achieve the same
effect but without calling into get_thermal_instance().

I'll try and prototype this, but feel free to suggest anything better if
you can think of something.

Thierry

Attachment: signature.asc
Description: PGP signature