[PATCH] sched, fair: try to prevent migration thread from preempting non-cfs task

From: Yafang Shao
Date: Tue Jun 15 2021 - 08:16:23 EST


We monitored our latency-sensitive RT tasks are randomly preempted by the
kthread migration/n, which means to migrate task from CPUn to the new
idle CPU which wakes up the migration/n. For example,

sensing_node-2511 [007] d... 945.351566: sched_switch: prev_comm=sensing_node prev_pid=2511 prev_prio=98 prev_state=S ==> next_comm=cat next_pid=2686 next_prio=120
cat-2686 [007] d... 945.351569: sched_switch: prev_comm=cat prev_pid=2686 prev_prio=120 prev_state=R+ ==> next_comm=sensing_node next_pid=2512 next_prio=98
sensing_node-2516 [004] dn.. 945.351571: sched_wakeup: comm=migration/7 pid=47 prio=0 target_cpu=007
sensing_node-2512 [007] d... 945.351572: sched_switch: prev_comm=sensing_node prev_pid=2512 prev_prio=98 prev_state=R ==> next_comm=migration/7 next_pid=47 next_prio=0
sensing_node-2516 [004] d... 945.351572: sched_switch: prev_comm=sensing_node prev_pid=2516 prev_prio=98 prev_state=S ==> next_comm=sensing_node next_pid=2502 next_prio=98
migration/7-47 [007] d... 945.351580: sched_switch: prev_comm=migration/7 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=sensing_node next_pid=2512 next_prio=98
sensing_node-2502 [004] d... 945.351605: sched_switch: prev_comm=sensing_node prev_pid=2502 prev_prio=98 prev_state=S ==> next_comm=cat next_pid=2686 next_prio=120

When CPU4 is waking migration/7, the CFS thread 'cat' is running on
CPU7, but then 'cat' is preempted by a RT task 'sensing_node', and
then the migration/7 preempts the RT task. The race happens between:
if (need_active_balance(&env)) {
and
raw_spin_rq_lock_irqsave(busiest, flags);

In order to reduce the race, we'd better do the last minute check before
waking up migration thread.

Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx>
Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
Cc: Valentin Schneider <valentin.schneider@xxxxxxx>

---

- Prev version
https://lore.kernel.org/lkml/CAKfTPtBd349eyDhA5ThCAHFd83cGMQKb_LDxD4QvyP-cJOBjqA@xxxxxxxxxxxxxx/

- Similar discussion
https://lore.kernel.org/lkml/CAKfTPtBygNcVewbb0GQOP5xxO96am3YeTZNP5dK9BxKHJJAL-g@xxxxxxxxxxxxxx/
---
kernel/sched/fair.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3248e24a90b0..597c7a940746 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9797,6 +9797,20 @@ static int load_balance(int this_cpu, struct rq *this_rq,
/* Record that we found at least one task that could run on this_cpu */
env.flags &= ~LBF_ALL_PINNED;

+ /*
+ * There may be a race between load balance starting migration
+ * thread to pull the cfs running thread and the RT thread
+ * waking up and preempting cfs task before migration threads
+ * which then preempt the RT thread.
+ * We'd better do the last minute check before starting
+ * migration thread to avoid preempting latency-sensitive thread.
+ */
+ if (busiest->curr->sched_class != &fair_sched_class) {
+ raw_spin_unlock_irqrestore(&busiest->lock,
+ flags);
+ goto out;
+ }
+
/*
* ->active_balance synchronizes accesses to
* ->active_balance_work. Once set, it's cleared
--
2.17.1