Re: [PATCH] lock/lockdep: Add missing graph_unlock in validate_chain

From: Xuewen Yan
Date: Thu Jan 04 2024 - 23:46:57 EST


Hi

On Fri, Jan 5, 2024 at 3:44 AM Boqun Feng <boqun.feng@xxxxxxxxx> wrote:
>
> Hi,
>
> On Thu, Jan 04, 2024 at 01:40:30PM +0800, Xuewen Yan wrote:
> > The lookup_chain_cache_add will get graph_lock, but the
> > validate_chain do not unlock before return 0.
> >
>
> Thanks for looking into this, a few comment below:
>
> > So add graph_unlock before return 0.
> >
> > Signed-off-by: Xuewen Yan <xuewen.yan@xxxxxxxxxx>
> > Signed-off-by: Zhiguo Niu <zhiguo.niu@xxxxxxxxxx>
> > ---
> > kernel/locking/lockdep.c | 11 +++++++----
> > 1 file changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > index 151bd3de5936..24995e1ebc62 100644
> > --- a/kernel/locking/lockdep.c
> > +++ b/kernel/locking/lockdep.c
> > @@ -3855,8 +3855,11 @@ static int validate_chain(struct task_struct *curr,
> > */
> > int ret = check_deadlock(curr, hlock);
> >
> > - if (!ret)
> > + if (!ret) {
> > + graph_unlock();
>
> Note that when check_deadlock() return 0, there is a
> print_deadlock_bug() before the return, so I think it covers the
> graph_unlock() (see debug_locks_off_graph_unlock()).

Yes, I did not see the check_deadlock's details carefully.

>
> > return 0;
> > + }
> > +
> > /*
> > * Add dependency only if this lock is not the head
> > * of the chain, and if the new lock introduces no more
> > @@ -3865,9 +3868,9 @@ static int validate_chain(struct task_struct *curr,
> > * serializes nesting locks), see the comments for
> > * check_deadlock().
> > */
> > - if (!chain_head && ret != 2) {
> > - if (!check_prevs_add(curr, hlock))
> > - return 0;
> > + if (!chain_head && ret != 2 && !check_prevs_add(curr, hlock)) {
> > + graph_unlock();
>
> This part is interesting, usually when an internal function in lockdep
> returns 0, it means there is an error (either a deadlock or internal
> error), and that means a print_*() would be called, and the graph lock
> will be unlocked in that print_*(). However, in check_prevs_add() there
> is one condition where it will return 0 without any print_*(), that is:
>
>
> in check_prev_add():
>
> /* <prev> is not found in <next>::locks_before */
> return 0;
>
> it's an internal error where <next> is in the <prev>::locks_after list
> but <prev> is not in the <next>::locks_before list, which should seldom
> happen: it's dead code. If you put a graph_unlock() before that return,
> I think it covers all the cases, unless I'm missing something subtle.

If only this condition does not unlock, It is indeed better to put
graph_unlock here.
I would change the patch in the V2.

>
> Are you hitting a real issue or this is found by code reading?

Indeed, we hit a real issue:
One cpu did not call graph_unlock, as a result, caused a deadlock with
other cpus,
because any cpu calling raw_spin_lock would get the graph_lock first.

Thanks!

--
BR
xuewen

>
> Regards,
> Boqun
>
> > + return 0;
> > }
> >
> > graph_unlock();
> > --
> > 2.25.1
> >
>