[PATCH] sched: Fix pick_next_task() race condition in core scheduling

From: Chen Yu
Date: Wed Apr 15 2020 - 22:51:07 EST


As Perter mentioned that Commit 6e2df0581f56 ("sched: Fix pick_next_task()
vs 'change' pattern race") has fixed a race condition due to rq->lock
improperly released after put_prev_task(), backport this fix to core
scheduling's pick_next_task() as well.

Without this fix, Aubrey, Long and I found an NULL exception point
triggered within one hour when running RDT MBA(Intel Resource Directory
Technolodge Memory Bandwidth Allocation) benchmarks on a 36 Core(72 HTs)
platform, which tries to dereference a NULL sched_entity:

[ 3618.429053] BUG: kernel NULL pointer dereference, address: 0000000000000160
[ 3618.429039] RIP: 0010:pick_task_fair+0x2e/0xa0
[ 3618.429042] RSP: 0018:ffffc90000317da8 EFLAGS: 00010046
[ 3618.429044] RAX: 0000000000000000 RBX: ffff88afdf4ad100 RCX: 0000000000000001
[ 3618.429045] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88afdf4ad100
[ 3618.429045] RBP: ffffc90000317dc0 R08: 0000000000000048 R09: 0100000000100000
[ 3618.429046] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[ 3618.429047] R13: 000000000002d080 R14: ffff88afdf4ad080 R15: 0000000000000014
[ 3618.429048] ? pick_task_fair+0x48/0xa0
[ 3618.429048] pick_next_task+0x34c/0x7e0
[ 3618.429049] ? tick_program_event+0x44/0x70
[ 3618.429049] __schedule+0xee/0x5d0
[ 3618.429050] schedule_idle+0x2c/0x40
[ 3618.429051] do_idle+0x175/0x280
[ 3618.429051] cpu_startup_entry+0x1d/0x30
[ 3618.429052] start_secondary+0x169/0x1c0
[ 3618.429052] secondary_startup_64+0xa4/0xb0

While with this patch applied, no NULL pointer exception was found within
14 hours for now. Although there's no direct evidence this fix would solve
the issue, it does fix a potential race condition.

Signed-off-by: Chen Yu <yu.c.chen@xxxxxxxxx>
---
kernel/sched/core.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 02495d44870f..ef101a3ef583 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4477,9 +4477,14 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
return next;
}

- prev->sched_class->put_prev_task(rq, prev);
- if (!rq->nr_running)
- newidle_balance(rq, rf);
+
+#ifdef CONFIG_SMP
+ for_class_range(class, prev->sched_class, &idle_sched_class) {
+ if (class->balance(rq, prev, rf))
+ break;
+ }
+#endif
+ put_prev_task(rq, prev);

smt_mask = cpu_smt_mask(cpu);

--
2.20.1