[PATCH] sched: Avoid spurious lock dependencies

From: Peter Zijlstra
Date: Tue Oct 01 2019 - 05:18:55 EST


On Thu, Sep 26, 2019 at 08:29:34AM -0400, Qian Cai wrote:

> Oh, you were talking about took #3 while holding #2. Anyway, your patch is
> working fine so far. Care to post/merge it officially or do you want me to post
> it?

Does the below adequately describe the situation?

---
Subject: sched: Avoid spurious lock dependencies

While seemingly harmless, __sched_fork() does hrtimer_init(), which,
when DEBUG_OBJETS, can end up doing allocations.

This then results in the following lock order:

rq->lock
zone->lock.rlock
batched_entropy_u64.lock

Which in turn causes deadlocks when we do wakeups while holding that
batched_entropy lock -- as the random code does.

Solve this by moving __sched_fork() out from under rq->lock. This is
safe because nothing there relies on rq->lock, as also evident from the
other __sched_fork() callsite.

Fixes: b7d5dc21072c ("random: add a spinlock_t to struct batched_entropy")
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
---
kernel/sched/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7880f4f64d0e..1832fc0fbec5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6039,10 +6039,11 @@ void init_idle(struct task_struct *idle, int cpu)
struct rq *rq = cpu_rq(cpu);
unsigned long flags;

+ __sched_fork(0, idle);
+
raw_spin_lock_irqsave(&idle->pi_lock, flags);
raw_spin_lock(&rq->lock);

- __sched_fork(0, idle);
idle->state = TASK_RUNNING;
idle->se.exec_start = sched_clock();
idle->flags |= PF_IDLE;