Re: [syzbot] [rdma?] INFO: trying to register non-static key in skb_dequeue (2)

From: Zhu Yanjun
Date: Tue May 23 2023 - 02:05:12 EST


On Tue, May 23, 2023 at 1:56 PM Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote:
>
>
>
> On 5/23/23 13:52, Zhu Yanjun wrote:
> > On Tue, May 23, 2023 at 1:44 PM Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote:
> >>
> >>
> >> On 5/23/23 13:18, Zhu Yanjun wrote:
> >>> On Tue, May 23, 2023 at 1:08 PM Zhu Yanjun <zyjzyj2000@xxxxxxxxx> wrote:
> >>>> On Tue, May 23, 2023 at 12:29 PM Zhu Yanjun <zyjzyj2000@xxxxxxxxx> wrote:
> >>>>> On Tue, May 23, 2023 at 12:10 PM Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote:
> >>>>>>
> >>>>>> On 5/23/23 12:02, Zhu Yanjun wrote:
> >>>>>>> On Tue, May 23, 2023 at 11:47 AM Zhu Yanjun <zyjzyj2000@xxxxxxxxx> wrote:
> >>>>>>>> On Tue, May 23, 2023 at 10:26 AM Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote:
> >>>>>>>>> On 5/23/23 10:13, syzbot wrote:
> >>>>>>>>>> Hello,
> >>>>>>>>>>
> >>>>>>>>>> syzbot tried to test the proposed patch but the build/boot failed:
> >>>>>>>>>>
> >>>>>>>>>> failed to apply patch:
> >>>>>>>>>> checking file drivers/infiniband/sw/rxe/rxe_qp.c
> >>>>>>>>>> patch: **** unexpected end of file in patch
> >>>>>>>> This is not the root cause. The fix is not good.
> >>>>>>> This problem is about "INFO: trying to register non-static key. The
> >>>>>>> code is fine but needs lockdep annotation, or maybe"
> >>>>> This warning is from "lock is not initialized". This is a
> >>>>> use-before-initialized problem.
> >>>>> The correct fix is to initialize the lock that is complained before it is used.
> >>>>>
> >>>>> Zhu Yanjun
> >>>> Based on the call trace, the followings are the order of this call trace.
> >>>>
> >>>> 291 /* called by the create qp verb */
> >>>> 292 int rxe_qp_from_init(struct rxe_dev *rxe, struct rxe_qp *qp,
> >>>> struct rxe_pd *pd,
> >>>> 297 {
> >>>> ...
> >>>> 317 rxe_qp_init_misc(rxe, qp, init);
> >>>> ...
> >>>> 322
> >>>> 323 err = rxe_qp_init_resp(rxe, qp, init, udata, uresp);
> >>>> 324 if (err)
> >>>> 325 goto err2; <--- error
> >>>>
> >>>> ...
> >>>>
> >>>> 334 err2:
> >>>> 335 rxe_queue_cleanup(qp->sq.queue); <--- Goto here
> >>>> 336 qp->sq.queue = NULL;
> >>>>
> >>>> In rxe_qp_init_resp, the error occurs before skb_queue_head_init.
> >>>> So this call trace appeared.
> >>> 250 static int rxe_qp_init_resp(struct rxe_dev *rxe, struct rxe_qp *qp,
> >>> 254 {
> >>> ...
> >>> 264
> >>> 265 type = QUEUE_TYPE_FROM_CLIENT;
> >>> 266 qp->rq.queue = rxe_queue_init(rxe, &qp->rq.max_wr,
> >>> 267 wqe_size, type);
> >>> 268 if (!qp->rq.queue)
> >>> 269 return -ENOMEM; <---Error here
> >>> 270
> >>>
> >>> ...
> >>>
> >>> 282 skb_queue_head_init(&qp->resp_pkts); <-this is not called.
> >>> ...
> >>> This will make spin_lock of resp_pkts is used before initialized.
> >> IMHO, the above is same as
> >>
> >>> Which is caused by "skb_queue_head_init(&qp->resp_pkts)" is not called
> >>> given rxe_qp_init_resp returns error, but the cleanup still trigger the
> >>> chain.
> >>>
> >>> rxe_qp_do_cleanup -> rxe_completer -> drain_resp_pkts ->
> >>> skb_dequeue(&qp->resp_pkts)
> >> my previous analysis. If not, could you provide another better way to
> >> fix it?
> > Move the initialization to the beginning. This can fix this problem.
> > See below:
> >
> > "
> > diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c
> > b/drivers/infiniband/sw/rxe/rxe_qp.c
> > index c5451a4488ca..22ef6188d7b1 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_qp.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
> > @@ -176,6 +176,9 @@ static void rxe_qp_init_misc(struct rxe_dev *rxe,
> > struct rxe_qp *qp,
> > spin_lock_init(&qp->rq.producer_lock);
> > spin_lock_init(&qp->rq.consumer_lock);
> >
> > + skb_queue_head_init(&qp->req_pkts);
> > + skb_queue_head_init(&qp->resp_pkts);
> > +
> > atomic_set(&qp->ssn, 0);
> > atomic_set(&qp->skb_out, 0);
> > }
> > @@ -234,8 +237,6 @@ static int rxe_qp_init_req(struct rxe_dev *rxe,
> > struct rxe_qp *qp,
> > qp->req.opcode = -1;
> > qp->comp.opcode = -1;
> >
> > - skb_queue_head_init(&qp->req_pkts);
> > -
> > rxe_init_task(&qp->req.task, qp, rxe_requester);
> > rxe_init_task(&qp->comp.task, qp, rxe_completer);
> >
> > @@ -279,8 +280,6 @@ static int rxe_qp_init_resp(struct rxe_dev *rxe,
> > struct rxe_qp *qp,
> > }
> > }
> >
> > - skb_queue_head_init(&qp->resp_pkts);
> > -
> > rxe_init_task(&qp->resp.task, qp, rxe_responder);
> >
> > qp->resp.opcode = OPCODE_NONE;
> > "
>
> It is weird to me that init them in init_misc instead of init_req/resp,
> given they
> are dedicated/used for the different purpose. But just my 0.02$.

There are some initialization problems in qp init. This needs
refactoring the related functions to fix all the problems.
Currently I am working on this. You know, this is not an easy task.
It will take me a lot of time and effort.
Now I use init_misc to initialize the related variables just as I did
in the past.
This is because init_misc is designed to initialize some variables.
And it will not cause the similar use-before-initialization problems.

Zhu Yanjun
>
> Guoqing