[PATCH v2 00/22] sched: Introduce IPC classes for load balance

From: Ricardo Neri
Date: Mon Nov 28 2022 - 08:13:53 EST


Hi,

This is the v2 of the patchset. Since it did not receive strong objections
on the design, I took the liberty of promoting the series from RFC to
PATCH :)

The problem statement and design do not change in this version. Thus, I did
not repeat the cover letter. It can be retrieved here [1].

This series depends on my other patches to use identical asym_packing CPU
priorities on all the SMT siblings of a physical core on x86 [2].

These patches apply cleanly on top of [2]. For convenience, these patches
and [2] can be found here:

https://github.com/ricardon/tip.git rneri/ipc_classes_v2

Thanks and BR,
Ricardo

Changes since v1 (sorted by significance):
* Renamed task_struct::class as task::struct_ipcc. (Joel)
* Use task_struct::ipcc = 0 for unclassified tasks. (PeterZ)
* Renamed CONFIG_SCHED_TASK_CLASSES as CONFIG_IPC_CLASSES. (PeterZ, Joel)
* Dropped patch to take spin lock to read the HFI table from the
* scheduler and from the HFI enabling code.
* Implemented per-CPU variables to store the IPCC scores of each class.
These can be read without holding a lock. (PeterZ).
* Dropped patch to expose is_core_idle() outside the scheduler. It is
now exposed as part of [2].
* Implemented cleanups and reworks from PeterZ when collecting IPCC
statistics. I took all his suggestions, except the computation of the
total IPC score of two physical cores.
* Quantified the cost of HRESET.
* Use an ALTERNATIVE macro instead of static_cpu_has() to execute HRESET
when supported. (PeterZ)
* Fixed a bug when selecting a busiest runqueue: when comparing two
runqueues with equal nr_running, we must compute the IPCC score delta
of both runqueues.
* Fixed the bit number DISABLE_ITD to the correct DISABLE_MASK: 14 instead
of 13.
* Redefined union hfi_thread_feedback_char_msr to ensure all
bit-fields are packed. (PeterZ)
* Use bit-fields to fit all the ipcc members of task_struct in 4 bytes.
(PeterZ)
* Shortened the names of the IPCC interfaces (PeterZ):
sched_task_classes_enabled >> sched_ipcc_enabled
arch_has_task_classes >> arch_has_ipc_classes
arch_update_task_class >> arch_update_ipcc
arch_get_task_class_score >> arch_get_ipcc_score
* Removed smt_siblings_idle argument from arch_update_ipcc(). (PeterZ)
* Added a comment to clarify why sched_asym_prefer() needs a tie breaker
only in update_sd_pick_busiest(). (PeterZ)
* Renamed functions for accuracy:
sched_asym_class_prefer() >> sched_asym_ipcc_prefer()
sched_asym_class_pick() >> sched_asym_ipcc_pick()
* Renamed local variables to improve the layout of the code block I added
in find_busiest_queue(). (PeterZ)
* Removed proposed CONFIG_INTEL_THREAD_DIRECTOR Kconfig option.
* Mark hardware_history_features as __ro_after_init instead of
__read_mostly. (PeterZ)

[1]. https://lore.kernel.org/lkml/20220909231205.14009-1-ricardo.neri-calderon@xxxxxxxxxxxxxxx/
[2]. https://lore.kernel.org/lkml/20221122203532.15013-1-ricardo.neri-calderon@xxxxxxxxxxxxxxx/

Ricardo Neri (22):
sched/task_struct: Introduce IPC classes of tasks
sched: Add interfaces for IPC classes
sched/core: Initialize the IPC class of a new task
sched/core: Add user_tick as argument to scheduler_tick()
sched/core: Update the IPC class of the current task
sched/fair: Collect load-balancing stats for IPC classes
sched/fair: Compute IPC class scores for load balancing
sched/fair: Use IPC class to pick the busiest group
sched/fair: Use IPC class score to select a busiest runqueue
thermal: intel: hfi: Introduce Intel Thread Director classes
thermal: intel: hfi: Store per-CPU IPCC scores
x86/cpufeatures: Add the Intel Thread Director feature definitions
thermal: intel: hfi: Update the IPC class of the current task
thermal: intel: hfi: Report the IPC class score of a CPU
thermal: intel: hfi: Define a default class for unclassified tasks
thermal: intel: hfi: Enable the Intel Thread Director
sched/task_struct: Add helpers for IPC classification
sched/core: Initialize helpers of task classification
thermal: intel: hfi: Implement model-specific checks for task
classification
x86/cpufeatures: Add feature bit for HRESET
x86/hreset: Configure history reset
x86/process: Reset hardware history in context switch

arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/hreset.h | 30 +++
arch/x86/include/asm/msr-index.h | 6 +-
arch/x86/include/asm/topology.h | 10 +
arch/x86/kernel/cpu/common.c | 30 ++-
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/cpu/scattered.c | 1 +
arch/x86/kernel/process_32.c | 3 +
arch/x86/kernel/process_64.c | 3 +
drivers/thermal/intel/intel_hfi.c | 229 ++++++++++++++++++++++-
include/linux/sched.h | 22 ++-
init/Kconfig | 12 ++
kernel/sched/core.c | 10 +-
kernel/sched/fair.c | 229 ++++++++++++++++++++++-
kernel/sched/sched.h | 60 ++++++
kernel/sched/topology.c | 8 +
kernel/time/timer.c | 2 +-
18 files changed, 653 insertions(+), 13 deletions(-)
create mode 100644 arch/x86/include/asm/hreset.h

--
2.25.1