Re: [PATCH v2 0/2] intel_powerclamp: New module parameter

From: srinivas pandruvada
Date: Mon Feb 06 2023 - 05:03:50 EST


On Mon, 2023-02-06 at 08:05 +0000, Zhang, Rui wrote:
> On Sun, 2023-02-05 at 18:45 -0800, srinivas pandruvada wrote:
> > Hi Rui,
> >
> > On Sun, 2023-02-05 at 15:57 +0000, Zhang, Rui wrote:
> > > Hi, Srinivas,
> > >
> > > First of all, the previous build error is gone.
> > >
> > > Second, I found something strange, which may be related with the
> > > scheduler asym-packing, so CC Ricardo.
> > >
> > I thought you disable ITMT before idle injection and reenebale
> > after
> > removal.
>
> No.
>
> I can reproduce this by playing with raw intel_powerclamp sysfs knobs
> and ITMT enabled.
>

This issue is happening even if ITMT disabled. If the module mask is
composed of P-cores it works or even on servers as expected.
Also if you offline all P-cores then select mask among E-cores, it is
working. Somehow P-core influences E-cores.

Since this patch is module mask related, that is functioning correctly.
We have to debug this interaction with P and E cores separately.

Thanks,
Srinivas


> >
> >
> >
> > > The test is done with pm linux-intel branch
>
> sorry, I mean linux-next branch.
>
> > >  + this patch series on an
> > > ADL system.
> > Can you do test on bleeding edge branch of Linux-pm?
> >
> > >  cpu0~cpu7 are Pcore cpus, cpu8-cpu15 are Ecore cpus, and
> > > intel_powerclamp is register as cooling_device21.
> > >
> > > 1. run stress -c 16
> > > 2. update /sys/module/intel_powerclamp/parameters/cpumask
> > >    echo 90 > /sys/module/intel_powerclamp/parameters/max_idle
> > > 3. echo 90 > /sys/class/thermal/cooling_device21/cur_state
> > > 4. echo 0 > /sys/class/thermal/cooling_device21/cur_state
> > > I use turbostat to monitor the CPU Busy% in all 4 steps.
> > >
> > > If 'cpumask' does not include all the Ecore CPUs, all CPUs
> > > becomes
> > > 100%
> > > busy after idle injection removed in step 4.
> > >
> > that should be the case.
> >
> > > If 'cpumask' includes all the Ecore CPUs, i.e. cpumask = FFxy, in
> > > some
> > > cases, the Ecore CPUs will drop to an Busy% much lower than 10%,
> > > and
> > > then they don't come back to busy after idle injection removed in
> > > step
> >  Do you see that idle injection is removed message in dmesg?
>
> yes.
>
> > We can also check powercap idle-inejct, if some CPUs still not wake
> > from play_idle.
>
> "ps" command shows the the idle_injection threads time is not
> increasing any more.
>
> >
> >
> > > 4, although we have 16 stress threads. And this also relates with
> > > how
> > > long we stay in idle injection.
> > >
> > > Say, when cpumask=fff3, the problem can be triggered occasionally
> > > if
> > > there is a 10 second timeout between step 3 and step4, but it is
> > > much
> > > easier to reproducible if I increase the timeout to 20 seconds.
> > >
> > > It seems that Pcore can always pull tasks from Ecores, but Ecore
> > > can
> > > not pull tasks from Pcore HT siblings.
> > >
> > That will be regular load balance threads should do.
> > Better to try upsteam kernel first.
>
> I'm already running with linux-pm tree linux-next branch + this patch
> series.
>
> thanks,
> rui
>
> >
> > Thanks,
> > Srinivas
> >
> >
> > > thanks,
> > > rui
> > >
> > > On Sat, 2023-02-04 at 18:59 -0800, Srinivas Pandruvada wrote:
> > > > Split from the series for powerclamp user of powercap idle-
> > > > inject.
> > > >
> > > > v2
> > > > - Build warnings reported by Rui
> > > > - Moved the powerclamp documentation to admin guide folder
> > > > - Commit log updated as suggested by Rafael and other code
> > > > suggestion
> > > >
> > > > Srinivas Pandruvada (2):
> > > >   Documentation:admin-guide: Move intel_powerclamp
> > > > documentation
> > > >   thermal/drivers/intel_powerclamp: Add two module parameters
> > > >
> > > >  Documentation/admin-guide/index.rst           |   1 +
> > > >  .../thermal/intel_powerclamp.rst              |  22 +++
> > > >  Documentation/driver-api/thermal/index.rst    |   1 -
> > > >  MAINTAINERS                                   |   1 +
> > > >  drivers/thermal/intel/intel_powerclamp.c      | 177
> > > > +++++++++++++++-
> > > > --
> > > >  5 files changed, 180 insertions(+), 22 deletions(-)
> > > >  rename Documentation/{driver-api => admin-
> > > > guide}/thermal/intel_powerclamp.rst (93%)
> > > >