[GIT PULL] Scheduler changes for v6.6

From: Ingo Molnar
Date: Mon Aug 28 2023 - 17:12:38 EST



Linus,

Please pull the latest sched/core git tree from:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched-core-2023-08-28

# HEAD: 2f88c8e802c8b128a155976631f4eb2ce4f3c805 sched/eevdf/doc: Modify the documented knob to base_slice_ns as well

Scheduler changes for v6.6:

- The biggest change is introduction of a new iteration of the
SCHED_FAIR interactivity code: the EEVDF ("Earliest Eligible Virtual
Deadline First") scheduler.

EEVDF too is a virtual-time scheduler, with two parameters (weight
and relative deadline), compared to CFS that had weight only.
It completely reworks the base scheduler: placement, preemption,
picking -- everything.

LWN.net, as usual, has a terrific writeup about EEVDF:

https://lwn.net/Articles/925371/

Preemption (both tick and wakeup) is driven by testing against
a fresh pick. Because the tree is now effectively an interval
tree, and the selection is no longer the 'leftmost' task,
over-scheduling is less of a problem. A lot of the CFS
heuristics are removed or replaced by more natural latency-space
parameters & constructs.

In terms of expected performance regressions: we'll and can fix
everything where a 'good' workload misbehaves with the new scheduler,
but EEVDF inevitably changes workload scheduling in a binary fashion,
hopefully for the better in the overwhelming majority of cases,
but in some cases it won't, especially in adversarial loads that
got lucky with the previous code, such as some variants of hackbench.
We are trying hard to err on the side of fixing all performance
regressions, but we expect some inevitable post-release iterations
of that process.

- Improve load-balancing on hybrid x86 systems: enable cluster
scheduling (again).

- Improve & fix bandwidth-scheduling on nohz systems.

- Improve bandwidth-throttling.

- Use lock guards to simplify and de-goto-ify control flow.

- Misc improvements, cleanups and fixes.

Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
Thanks,

Ingo

------------------>
Chen Yu (1):
sched/topology: Align group flags when removing degenerate domain

Chin Yik Ming (1):
sched/headers: Rename task_struct::state to task_struct::__state in the comments too

Cruz Zhao (1):
sched/core: introduce sched_core_idle_cpu()

Cyril Hrubis (2):
sched/rt: Fix sysctl_sched_rr_timeslice intial value
sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset

Johannes Weiner (1):
MAINTAINERS: Add Peter explicitly to the psi section

Josh Don (2):
sched: don't account throttle time for empty groups
sched: add throttled time stat for throttled children

Miaohe Lin (1):
sched/psi: make psi_cgroups_enabled static

Peter Zijlstra (22):
x86/sched: Enable cluster scheduling on Hybrid
sched/debug: Dump domains' sched group flags
sched/fair: Add cfs_rq::avg_vruntime
sched/fair: Remove sched_feat(START_DEBIT)
sched/fair: Add lag based placement
rbtree: Add rb_add_augmented_cached() helper
sched/fair: Implement an EEVDF-like scheduling policy
sched/fair: Commit to lag based placement
sched/smp: Use lag to simplify cross-runqueue placement
sched/fair: Commit to EEVDF
sched/debug: Rename sysctl_sched_min_granularity to sysctl_sched_base_slice
sched/fair: Propagate enqueue flags into place_entity()
sched: Simplify get_nohz_timer_target()
sched: Simplify sysctl_sched_uclamp_handler()
sched: Simplify: migrate_swap_stop()
sched: Simplify wake_up_if_idle()
sched: Simplify ttwu()
sched: Simplify sched_exec()
sched: Simplify sched_tick_remote()
sched: Simplify try_steal_cookie()
sched: Simplify sched_core_cpu_{starting,deactivate}()
sched/eevdf: Curb wakeup-preemption

Phil Auld (2):
sched, cgroup: Restore meaning to hierarchical_quota
sched/fair: Block nohz tick_stop when cfs bandwidth in use

Randy Dunlap (1):
sched/psi: Select KERNFS as needed

Ricardo Neri (1):
sched/fair: Consider the idle state of the whole core for load balance

Shrikanth Hegde (1):
sched/eevdf/doc: Modify the documented knob to base_slice_ns as well

Tim C Chen (3):
sched/fair: Determine active load balance for SMT sched groups
sched/topology: Record number of cores in sched group
sched/fair: Implement prefer sibling imbalance calculation between asymmetric groups

Vincent Guittot (2):
sched/fair: Stabilize asym cpu capacity system idle cpu selection
sched/fair: remove util_est boosting

Wander Lairson Costa (2):
kernel/fork: beware of __put_task_struct() calling context
sched: avoid false lockdep splat in put_task_struct()


Documentation/scheduler/sched-design-CFS.rst | 2 +-
MAINTAINERS | 1 +
arch/x86/kernel/smpboot.c | 11 +-
include/linux/cgroup-defs.h | 2 +
include/linux/rbtree_augmented.h | 26 +
include/linux/sched.h | 21 +-
include/linux/sched/task.h | 38 +-
init/Kconfig | 1 +
kernel/cgroup/cgroup.c | 34 +
kernel/fork.c | 8 +
kernel/sched/core.c | 496 +++++-----
kernel/sched/debug.c | 49 +-
kernel/sched/fair.c | 1333 ++++++++++++++------------
kernel/sched/features.h | 24 +-
kernel/sched/psi.c | 2 +-
kernel/sched/rt.c | 5 +-
kernel/sched/sched.h | 57 +-
kernel/sched/topology.c | 15 +-
kernel/softirq.c | 2 +-
19 files changed, 1217 insertions(+), 910 deletions(-)