Re: possible deadlock in _destroy_id

From: Jason Gunthorpe
Date: Wed Nov 25 2020 - 19:25:10 EST


On Wed, Nov 25, 2020 at 08:48:32AM +0200, Leon Romanovsky wrote:
> > commit c80a0c52d85c49a910d0dc0e342e8d8898677dc0
> > Author: Leon Romanovsky <leon@xxxxxxxxxx>
> > Date: Wed Nov 4 16:40:07 2020 +0200
> >
> > RDMA/cma: Add missing error handling of listen_id
> >
> > Don't silently continue if rdma_listen() fails but destroy previously
> > created CM_ID and return an error to the caller.
> >
> > rdma_destroy_id() can't be called while holding the global lock
> >
> > This is quite hard to fix. I came up with this ugly thing:
> >
> > From 8e6568f99fbe4bf734cc4e5dcda987e4ae118bdd Mon Sep 17 00:00:00 2001
> > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > Date: Wed, 18 Nov 2020 09:33:23 -0400
> > Subject: [PATCH] RDMA/cma: Fix deadlock on &lock in rdma_cma_listen_on_all()
> > error unwind
> >
> > rdma_detroy_id() cannot be called under &lock - we must instead keep the
> > error'd ID around until &lock can be released, then destory it.
> >
> > This is complicated by the usual way listen IDs are destroyed through
> > cma_process_remove() which can run at any time and will asynchronously
> > destroy the same ID.
> >
> > Remove the ID from visiblity of cma_process_remove() before going down the
> > destroy path outside the locking.
> >
> > Fixes: c80a0c52d85c ("RDMA/cma: Add missing error handling of listen_id")
> > Reported-by: syzbot+1bc48bf7f78253f664a9@xxxxxxxxxxxxxxxxxxxxxxxxx
> > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > drivers/infiniband/core/cma.c | 25 ++++++++++++++++++-------
> > 1 file changed, 18 insertions(+), 7 deletions(-)
> >
>
> Thanks,
> Reviewed-by: Leon Romanovsky <leonro@xxxxxxxxxx>

Okay, applied to for-next, thanks

Jason