Re: [PATCH] locking/semaphore: Use wake_q to wake up processes outside lock critical section

From: Waiman Long
Date: Fri Feb 11 2022 - 10:08:21 EST


On 2/11/22 05:51, Peter Zijlstra wrote:
On Thu, Feb 10, 2022 at 12:04:59PM -0500, Waiman Long wrote:
On 2/10/22 05:19, Peter Zijlstra wrote:
I am sorry that I might have stripped out too much for the lockdep splat to
make it understandable. Below is the full lockdep splat:
Right, so please just transcribe the relevant bits instead of including
this massive splat. It really isn't too complicated.

That can be summarized as:

0: 1: 2:
pi_lock rq->lock console_sem
rq->lock console_sem pi_lock

Which is *much* shorter and *much* easier to read.

OK, got it. However, I thought the circular dependency has been explained by that part of the lockdep splat:

[ 9776.459905] Chain exists of:
[ 9776.459906]   (console_sem).lock --> &p->pi_lock --> &rq->__lock

[ 9776.459911]  Possible unsafe locking scenario:

[ 9776.459913]        CPU0                    CPU1
[ 9776.459914]        ----                    ----
[ 9776.459914]   lock(&rq->__lock);
[ 9776.459917] lock(&p->pi_lock);
[ 9776.459919] lock(&rq->__lock);
[ 9776.459921]   lock((console_sem).lock);

This should convey the same information. Though I think I need to describe where those locking sequence happen.

More concerning, that ordering is invalid to begin with, so the above
seems like a very poor justification for this patch.
Which lock ordering are you considered invalid?
1: above. You cannot take a semaphore inside a (raw) spinlock.
You may have been confused by the name "console_sem". A semaphore is basically a count protected by a raw spinlock. So console_sem.lock is actually a raw spinlock. It is perfectly legal to take a raw spinlock after holding another raw spinlock.

The stack trace included in the patch description show the
(console_sem).lock --> &p->pi_lock --> &rq->__lock sequence because of the
wake_up_process() call while holding the console_sem.lock.

The reverse &rq->__lock lock may happen when a printk() statement is called
while holding the rq lock.

In this case, the printk() is triggered by a SCHED_WARN_ON() statement in
update_rq_clock() which don't call printk_deferred and so won't have
LOGLEVEL_SCHED set. I guess there is alternative way to work around this
issue, but moving the process wakeup out from the semaphore spinlock will
solve this problem in case there are other corner cases like that.

I will update the patch description to include this additional information.
The right solution is to burn printk_deferred at the stake and most of
printk along with it (they're working on it).

Hitting that WARN is the real problem, the rest is collateral damage and
I'm really not interested in fixing that.

Yes, hitting the WARN is the cause of this lockdep splat. In the v2 patch, I list the alternatives as either banning WARN inside rq lock or make a WARN that work with rq lock if we are not going to fix the semaphore. So what is your recommendation about handling that?

Cheers,
Longman