Re: [bug report] locking/rtmutex: Return success on deadlock for ww_mutex waiters

From: Peter Zijlstra
Date: Wed Sep 01 2021 - 04:10:55 EST


On Tue, Aug 31, 2021 at 11:21:52AM +0300, Dan Carpenter wrote:
> Hello Peter Zijlstra,

Hi Dan :-)

> This is a semi-automatic email about new static checker warnings.
>
> The patch a055fcc132d4: "locking/rtmutex: Return success on deadlock
> for ww_mutex waiters" from Aug 26, 2021, leads to the following
> Smatch complaint:
>
> kernel/locking/rtmutex.c:756 rt_mutex_adjust_prio_chain()
> error: we previously assumed 'orig_waiter' could be null (see line 644)
>
> kernel/locking/rtmutex.c
> 643 */
> 644 if (orig_waiter && !rt_mutex_owner(orig_lock))
> ^^^^^^^^^^^
> A lot of this code assumes "orig_waiter" can be NULL.
>

> 735 /*
> 736 * [6] check_exit_conditions_2() protected by task->pi_lock and
> 737 * lock->wait_lock.
> 738 *
> 739 * Deadlock detection. If the lock is the same as the original
> 740 * lock which caused us to walk the lock chain or if the
> 741 * current lock is owned by the task which initiated the chain
> 742 * walk, we detected a deadlock.
> 743 */
> 744 if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
> ^^^^^^^^^^^^^^^^^
> This might mean it's a false positive, but Smatch isn't clever enough to
> figure it out. And I'm stupid too! Plus lazy... and ugly.
>
> 745 ret = -EDEADLK;
> 746
> 747 /*
> 748 * When the deadlock is due to ww_mutex; also see above. Don't
> 749 * report the deadlock and instead let the ww_mutex wound/die
> 750 * logic pick which of the contending threads gets -EDEADLK.
> 751 *
> 752 * NOTE: assumes the cycle only contains a single ww_class; any
> 753 * other configuration and we fail to report; also, see
> 754 * lockdep.
> 755 */
> 756 if (IS_ENABLED(CONFIG_PREEMPT_RT) && orig_waiter->ww_ctx)
> ^^^^^^^^^^^^^^^^^^^
> Unchecked dereference.


This is difficult... and I'm glad you flagged it. The normal de-boost
path is through rt_mutex_adjust_prio() and that has: .orig_lock == NULL
&& .orig_waiter == NULL. And as such it would never trigger the above
case.

However, there is remove_waiter() which is called on rt_mutex_lock()'s
failure paths and that doesn't have .orig_lock == NULL, and as such
*could* conceivably trigger this.

Let me figure out what the right thing to do is.

Thanks!