Re: [PATCH] thermal: intel_powerclamp: Fix cur_state for multi package system

From: Rafael J. Wysocki
Date: Wed Feb 01 2023 - 14:22:33 EST


On Wed, Feb 1, 2023 at 8:19 PM srinivas pandruvada
<srinivas.pandruvada@xxxxxxxxxxxxxxx> wrote:
>
> On Wed, 2023-02-01 at 20:10 +0100, Rafael J. Wysocki wrote:
> > On Wed, Feb 1, 2023 at 7:06 PM Srinivas Pandruvada
> > <srinivas.pandruvada@xxxxxxxxxxxxxxx> wrote:
> > >
> > > The powerclamp cooling device cur_state shows actual idle observed
> > > by
> > > package C-state idle counters. But the implementation is not
> > > sufficient
> > > for multi package or multi die system. The cur_state value is
> > > incorrect.
> > > On these systems, these counters must be read from each package/die
> > > and
> > > somehow aggregate them. But there is no good method for
> > > aggregation.
> > >
> > > It was not a problem when explicit CPU model addition was required
> > > to
> > > enable intel powerclamp. In this way certain CPU models could have
> > > been avoided. But with the removal of CPU model check with the
> > > availability of Package C-state counters, the driver is loaded on
> > > most
> > > of the recent systems.
> > >
> > > For multi package/die systems, just show the actual target idle
> > > state,
> > > the system is trying to achieve. In powerclamp this is the user set
> > > state minus one.
> > >
> > > Also there is no use of starting a worker thread for polling
> > > package
> > > C-state counters and applying any compensation.
> >
> > I think that the last paragraph applies to systems with multiple
> > dies/packages?
> Yes.
>
> >
> > > Fixes: b721ca0d1927 ("thermal/powerclamp: remove cpu whitelist")
> >
> >
> >
> > > Signed-off-by: Srinivas Pandruvada
> > > <srinivas.pandruvada@xxxxxxxxxxxxxxx>
> > > Cc: stable@xxxxxxxxxxxxxxx # 4.14+
> > > ---
> > > drivers/thermal/intel/intel_powerclamp.c | 20 ++++++++++++++++----
> > > 1 file changed, 16 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/thermal/intel/intel_powerclamp.c
> > > b/drivers/thermal/intel/intel_powerclamp.c
> > > index b80e25ec1261..64f082c584b2 100644
> > > --- a/drivers/thermal/intel/intel_powerclamp.c
> > > +++ b/drivers/thermal/intel/intel_powerclamp.c
> > > @@ -57,6 +57,7 @@
> > >
> > > static unsigned int target_mwait;
> > > static struct dentry *debug_dir;
> > > +static bool poll_pkg_cstate_enable;
> > >
> > > /* user selected target */
> > > static unsigned int set_target_ratio;
> > > @@ -261,6 +262,9 @@ static unsigned int get_compensation(int ratio)
> > > {
> > > unsigned int comp = 0;
> > >
> > > + if (!poll_pkg_cstate_enable)
> > > + return 0;
> > > +
> > > /* we only use compensation if all adjacent ones are good
> > > */
> > > if (ratio == 1 &&
> > > cal_data[ratio].confidence >= CONFIDENCE_OK &&
> > > @@ -519,7 +523,8 @@ static int start_power_clamp(void)
> > > control_cpu = cpumask_first(cpu_online_mask);
> > >
> > > clamping = true;
> > > - schedule_delayed_work(&poll_pkg_cstate_work, 0);
> > > + if (poll_pkg_cstate_enable)
> > > + schedule_delayed_work(&poll_pkg_cstate_work, 0);
> > >
> > > /* start one kthread worker per online cpu */
> > > for_each_online_cpu(cpu) {
> > > @@ -585,11 +590,15 @@ static int powerclamp_get_max_state(struct
> > > thermal_cooling_device *cdev,
> > > static int powerclamp_get_cur_state(struct thermal_cooling_device
> > > *cdev,
> > > unsigned long *state)
> > > {
> > > - if (true == clamping)
> > > - *state = pkg_cstate_ratio_cur;
> > > - else
> > > + if (true == clamping) {
> >
> > This really should be
> I can change that, just kept the old style.
> I will send an update.
>
> >
> > if (clamping) {
> >
> > > + if (poll_pkg_cstate_enable)
> > > + *state = pkg_cstate_ratio_cur;
> > > + else
> > > + *state = set_target_ratio;
> > > + } else {
> > > /* to save power, do not poll idle ratio while not
> > > clamping */
> > > *state = -1; /* indicates invalid state */
> > > + }
> > >
> > > return 0;
> > > }
> > > @@ -712,6 +721,9 @@ static int __init powerclamp_init(void)
> > > goto exit_unregister;
> > > }
> > >
> > > + if (topology_max_packages() == 1 &&
> > > topology_max_die_per_package() == 1)
> > > + poll_pkg_cstate_enable = true;
> > > +
> > > cooling_dev =
> > > thermal_cooling_device_register("intel_powerclamp", NULL,
> > >
> > > &powerclamp_cooling_ops);
> > > if (IS_ERR(cooling_dev)) {
> > > --
> >
> > This fixes a rather old bug and we are late in the cycle, so I'm a
> > bit
> > reluctant to push it for -rc7 or -rc8. I would prefer to apply it
> > for
> > 6.3, but let it go before the other powerclamp driver changes from
> > you.
> Yes, that's why I rebased other patches on top of this.
>
> > This way, if anyone needs to backport it or put it into
> > -stable, they will be able to do that without pulling in the more
> > intrusive material.
> >
> > Now, I do realize that this avoids changing the current behavior too
> > much, but I think that it is plain confusing to return
> > pkg_cstate_ratio_cur from powerclamp_get_cur_state() in any case. It
> > should always return set_target_ratio IMV.
> It should. It in unnecessary complications. When I use in thermald, I
> don't look at the returned value from cur_state as this doesn't matter
> if the temperature is not under control. I will change this for all
> cases.

I think that this should be a separate patch, though, not to be
confused with the fix.