Re: Possible race in rt_mutex_adjust_prio_chain

From: Henry Wu
Date: Thu Jul 06 2023 - 22:27:37 EST


Hi, Peter.

Peter Zijlstra <peterz@xxxxxxxxxxxxx> 于2023年7月7日周五 03:58写道:
>
> Current notes...
>
> We hold [L] from 5-13; we hold [P1] 1-8 and [P2] 10-13
>
> P1 - blocked task
> P2 - lock owner
>
> 7 holds [L]+[P1]
> modifies the values, which, temporarily, messes up the pi_waiters tree
>
> 11 holds [L]+[P2]
> requeues the waiter on pi_waiter, restoring pi_waiters tree
>
> pi_waiters is modified by:
>
> - rt_mutex_{en,de}queue_pi(); which are used:
>
> - [11] (holds [L]+[P2])
> - try_to_wake_rt_mutex() [L]+[P3] ?!? (P3 will be owner,
> but is not yet)
> - task_blocks_on_rt_mutex() [L]+[P2]
> - mark_wakeup_next_waiter() [L]+[P2] (current is owner,
> gives up lock)
> - remove_waiter() [L]+[P2]
>
> pi_waiters is used by:
>
> - task_top_pi_waiter(), asserts [P], this is used:
>
> - rt_mutex_adjust_prio(), which asserts [P2] (*), is used:
>
> - [11] (holds [L])
> - task_blocks_on_rt_mutex() (holds [L])
> - mark_wakeup_next_waiter() (holds [L])
> - remove_waiter() (holds [L])
>
> (*)(see patch below -- adding more assertions)
>
> - [3] -- does *NOT* hold [L], does hold [P]
>
>
> Now, [3] doesn't hold [L], but since 'all' modifications are done under
> [L]+[P], just holding [P] should provide a stable read. Except [7], that
> messes up the tree while not holding (the right!) [P].
>
> It's late, I'll have to continue staring at this tomorrow!
>

Your analysis is correct! I try to quote part of my previous reply
with correct format:

CPU0 CPU1
......... ..........
rt_mutex_adjust_prio_chain(C)
rt_mutex_adjust_prio_chain(B)
......
[7] update waiter->prio
(Now A's pi_waiters rbtree is
corrupted temporarily)
...... [11] enqueue operation may select
insert position according to corrupted
waiter node CPU0 created.
[11] Even though we fixed
corrupted waiter now, we
are not sure about pi_waiters's
sanity because other cpu may
create new invariant violation
based on our violation.