Re: [PATCH v3 0/5] Rework system pressure interface to the scheduler

From: Vincent Guittot
Date: Tue Jan 09 2024 - 08:30:09 EST


On Tue, 9 Jan 2024 at 12:34, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>
> On 08/01/2024 14:48, Vincent Guittot wrote:
> > Following the consolidation and cleanup of CPU capacity in [1], this serie
> > reworks how the scheduler gets the pressures on CPUs. We need to take into
> > account all pressures applied by cpufreq on the compute capacity of a CPU
> > for dozens of ms or more and not only cpufreq cooling device or HW
> > mitigiations. we split the pressure applied on CPU's capacity in 2 parts:
> > - one from cpufreq and freq_qos
> > - one from HW high freq mitigiation.
> >
> > The next step will be to add a dedicated interface for long standing
> > capping of the CPU capacity (i.e. for seconds or more) like the
> > scaling_max_freq of cpufreq sysfs. The latter is already taken into
> > account by this serie but as a temporary pressure which is not always the
> > best choice when we know that it will happen for seconds or more.
>
> I guess this is related to the 'user space system pressure' (*) slide of
> your OSPM '23 talk.

yes

>
> Where do you draw the line when it comes to time between (*) and the
> 'medium pace system pressure' (e.g. thermal and FREQ_QOS).

My goal is to consider the /sys/../scaling_max_freq as the 'user space
system pressure'

>
> IIRC, with (*) you want to rebuild the sched domains etc.

The easiest way would be to rebuild the sched_domain but the cost is
not small so I would prefer to skip the rebuild and add a new signal
that keep track on this capped capacity

>
> >
> > [1] https://lore.kernel.org/lkml/20231211104855.558096-1-vincent.guittot@xxxxxxxxxx/
> >
> > Change since v1:
> > - Rework cpufreq_update_pressure()
> >
> > Change since v1:
> > - Use struct cpufreq_policy as parameter of cpufreq_update_pressure()
> > - Fix typos and comments
> > - Make sched_thermal_decay_shift boot param as deprecated
> >
> > Vincent Guittot (5):
> > cpufreq: Add a cpufreq pressure feedback for the scheduler
> > sched: Take cpufreq feedback into account
> > thermal/cpufreq: Remove arch_update_thermal_pressure()
> > sched: Rename arch_update_thermal_pressure into
> > arch_update_hw_pressure
> > sched/pelt: Remove shift of thermal clock
> >
> > .../admin-guide/kernel-parameters.txt | 1 +
> > arch/arm/include/asm/topology.h | 6 +-
> > arch/arm64/include/asm/topology.h | 6 +-
> > drivers/base/arch_topology.c | 26 ++++----
> > drivers/cpufreq/cpufreq.c | 36 +++++++++++
> > drivers/cpufreq/qcom-cpufreq-hw.c | 4 +-
> > drivers/thermal/cpufreq_cooling.c | 3 -
> > include/linux/arch_topology.h | 8 +--
> > include/linux/cpufreq.h | 10 +++
> > include/linux/sched/topology.h | 8 +--
> > .../{thermal_pressure.h => hw_pressure.h} | 14 ++---
> > include/trace/events/sched.h | 2 +-
> > init/Kconfig | 12 ++--
> > kernel/sched/core.c | 8 +--
> > kernel/sched/fair.c | 63 +++++++++----------
> > kernel/sched/pelt.c | 18 +++---
> > kernel/sched/pelt.h | 16 ++---
> > kernel/sched/sched.h | 22 +------
> > 18 files changed, 144 insertions(+), 119 deletions(-)
> > rename include/trace/events/{thermal_pressure.h => hw_pressure.h} (55%)
>