[GIT PULL] Scheduler changes for v6.5

From: Ingo Molnar
Date: Tue Jun 27 2023 - 06:20:34 EST


Linus,

Please pull the latest sched/core git tree from:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched-core-2023-06-27

# HEAD: ebb83d84e49b54369b0db67136a5fe1087124dcc sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()

NOTE:

When merging this tree you'll get a new conflict in
drivers/clocksource/hyperv_timer.c, due to overlapping changes.

In case you want to double check your conflict resolution, our -tip CI
conflict resolution can be found at the e31a421069a6 merge commit in the
core/merge -tip branch:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core/merge

Scheduler changes for v6.5:

- Scheduler SMP load-balancer improvements:

- Avoid unnecessary migrations within SMT domains on hybrid systems.

Problem:

On hybrid CPU systems, (processors with a mixture of higher-frequency
SMT cores and lower-frequency non-SMT cores), under the old code
lower-priority CPUs pulled tasks from the higher-priority cores if
more than one SMT sibling was busy - resulting in many unnecessary
task migrations.

Solution:

The new code improves the load balancer to recognize SMT cores with more
than one busy sibling and allows lower-priority CPUs to pull tasks, which
avoids superfluous migrations and lets lower-priority cores inspect all SMT
siblings for the busiest queue.

- Implement the 'runnable boosting' feature in the EAS balancer: consider CPU
contention in frequency, EAS max util & load-balance busiest CPU selection.

This improves CPU utilization for certain workloads, while leaves other key
workloads unchanged.

- Scheduler infrastructure improvements:

- Rewrite the scheduler topology setup code by consolidating it
into the build_sched_topology() helper function and building
it dynamically on the fly.

- Resolve the local_clock() vs. noinstr complications by rewriting
the code: provide separate sched_clock_noinstr() and
local_clock_noinstr() functions to be used in instrumentation code,
and make sure it is all instrumentation-safe.

- Fixes:

- Fix a kthread_park() race with wait_woken()

- Fix misc wait_task_inactive() bugs unearthed by the -rt merge:
- Fix UP PREEMPT bug by unifying the SMP and UP implementations.
- Fix task_struct::saved_state handling.

- Fix various rq clock update bugs, unearthed by turning on the rq clock
debugging code.

- Fix the PSI WINDOW_MIN_US trigger limit, which was easy to trigger by
creating enough cgroups, by removing the warnign and restricting
window size triggers to PSI file write-permission or CAP_SYS_RESOURCE.

- Propagate SMT flags in the topology when removing degenerate domain

- Fix grub_reclaim() calculation bug in the deadline scheduler code

- Avoid resetting the min update period when it is unnecessary, in
psi_trigger_destroy().

- Don't balance a task to its current running CPU in load_balance(),
which was possible on certain NUMA topologies with overlapping
groups.

- Fix the sched-debug printing of rq->nr_uninterruptible

- Cleanups:

- Address various -Wmissing-prototype warnings, as a preparation
to (maybe) enable this warning in the future.

- Remove unused code

- Mark more functions __init

- Fix shadow-variable warnings

Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
Thanks,

Ingo

------------------>
Arnd Bergmann (5):
sched: Hide unused sched_update_scaling()
sched: Add schedule_user() declaration
sched/fair: Hide unused init_cfs_bandwidth() stub
sched: Make task_vruntime_update() prototype visible
sched/fair: Move unused stub functions to header

Arve Hjønnevåg (1):
sched/wait: Fix a kthread_park race with wait_woken()

Chen Yu (1):
x86/sched: Add the SD_ASYM_PACKING flag to the die domain of hybrid processors

Dietmar Eggemann (2):
sched/fair: Refactor CPU utilization functions
sched/fair, cpufreq: Introduce 'runnable boosting'

Hao Jia (3):
sched/core: Fixed missing rq clock update before calling set_rq_offline()
sched/core: Avoid double calling update_rq_clock() in __balance_push_cpu_stop()
sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()

Miaohe Lin (2):
sched/deadline: remove unused dl_bandwidth
sched/topology: Mark set_sched_topology() __init

Peter Zijlstra (17):
sched: Unconditionally use full-fat wait_task_inactive()
sched: Consider task_struct::saved_state in wait_task_inactive()
x86/sched: Rewrite topology setup
seqlock/latch: Provide raw_read_seqcount_latch_retry()
time/sched_clock: Provide sched_clock_noinstr()
arm64/io: Always inline all of __raw_{read,write}[bwlq]()
arm64/arch_timer: Provide noinstr sched_clock_read() functions
loongarch: Provide noinstr sched_clock_read()
s390/time: Provide sched_clock_noinstr()
math64: Always inline u128 version of mul_u64_u64_shr()
x86/vdso: Fix gettimeofday masking
clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing U64_MAX
clocksource: hyper-v: Provide noinstr sched_clock()
x86/tsc: Provide sched_clock_noinstr()
sched/clock: Provide local_clock_noinstr()
cpuidle: Use local_clock_noinstr()
arm64/arch_timer: Fix MMIO byteswap

Ricardo Neri (11):
sched/fair: Move is_core_idle() out of CONFIG_NUMA
sched/fair: Only do asym_packing load balancing from fully idle SMT cores
sched/fair: Simplify asym_packing logic for SMT cores
sched/fair: Let low-priority cores help high-priority busy SMT cores
sched/fair: Keep a fully_busy SMT sched group as busiest
sched/fair: Use the busiest group to set prefer_sibling
sched/fair: Do not even the number of busy CPUs via asym_packing
sched/topology: Check SDF_SHARED_CHILD in highest_flag_domain()
sched/topology: Remove SHARED_CHILD from ASYM_PACKING
x86/sched: Remove SD_ASYM_PACKING from the SMT domain flags
x86/sched/itmt: Give all SMT siblings of a core the same priority

Suren Baghdasaryan (1):
psi: remove 500ms min window size limitation for triggers

Tim C Chen (1):
sched/topology: Propagate SMT flags when removing degenerate domain

Tom Rix (1):
sched/fair: Rename variable cpu_util eff_util

Vineeth Pillai (2):
sched/deadline: Fix bandwidth reclaim equation in GRUB
sched/deadline: Update GRUB description in the documentation

Yang Yang (1):
sched/psi: Avoid resetting the min update period when it is unnecessary

Yicong Yang (1):
sched/fair: Don't balance task to its current running CPU

晏艳(采苓) (1):
sched/debug: Correct printing for rq->nr_uninterruptible


Documentation/scheduler/sched-deadline.rst | 5 +-
arch/arm64/include/asm/arch_timer.h | 8 +-
arch/arm64/include/asm/io.h | 12 +-
arch/loongarch/include/asm/loongarch.h | 2 +-
arch/loongarch/kernel/time.c | 6 +-
arch/s390/include/asm/timex.h | 13 +-
arch/s390/kernel/time.c | 5 +
arch/x86/include/asm/mshyperv.h | 5 +
arch/x86/include/asm/vdso/gettimeofday.h | 41 +++-
arch/x86/kernel/itmt.c | 23 +-
arch/x86/kernel/kvmclock.c | 4 +-
arch/x86/kernel/smpboot.c | 102 +++++----
arch/x86/kernel/tsc.c | 38 +++-
arch/x86/kvm/x86.c | 7 +-
arch/x86/xen/time.c | 3 +-
drivers/clocksource/arm_arch_timer.c | 54 +++--
drivers/clocksource/hyperv_timer.c | 44 ++--
drivers/cpuidle/cpuidle.c | 8 +-
drivers/cpuidle/poll_state.c | 4 +-
include/clocksource/hyperv_timer.h | 24 +--
include/linux/kthread.h | 1 +
include/linux/math64.h | 2 +-
include/linux/rbtree_latch.h | 2 +-
include/linux/sched.h | 7 +-
include/linux/sched/clock.h | 17 +-
include/linux/sched/sd_flags.h | 5 +-
include/linux/sched/topology.h | 2 +-
include/linux/seqlock.h | 15 +-
kernel/cgroup/cgroup.c | 12 ++
kernel/kthread.c | 10 +
kernel/printk/printk.c | 2 +-
kernel/sched/clock.c | 19 +-
kernel/sched/core.c | 278 +++++++++++++-----------
kernel/sched/cpufreq_schedutil.c | 3 +-
kernel/sched/deadline.c | 57 ++---
kernel/sched/debug.c | 2 +-
kernel/sched/fair.c | 329 ++++++++++++++++++-----------
kernel/sched/psi.c | 19 +-
kernel/sched/sched.h | 105 +++++----
kernel/sched/topology.c | 15 +-
kernel/sched/wait.c | 7 +-
kernel/time/sched_clock.c | 24 ++-
kernel/time/timekeeping.c | 4 +-
43 files changed, 777 insertions(+), 568 deletions(-)