Re: [syzbot] [rdma?] INFO: trying to register non-static key in skb_dequeue (2)

From: Guoqing Jiang
Date: Tue May 23 2023 - 01:57:03 EST




On 5/23/23 13:52, Zhu Yanjun wrote:
On Tue, May 23, 2023 at 1:44 PM Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote:


On 5/23/23 13:18, Zhu Yanjun wrote:
On Tue, May 23, 2023 at 1:08 PM Zhu Yanjun <zyjzyj2000@xxxxxxxxx> wrote:
On Tue, May 23, 2023 at 12:29 PM Zhu Yanjun <zyjzyj2000@xxxxxxxxx> wrote:
On Tue, May 23, 2023 at 12:10 PM Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote:

On 5/23/23 12:02, Zhu Yanjun wrote:
On Tue, May 23, 2023 at 11:47 AM Zhu Yanjun <zyjzyj2000@xxxxxxxxx> wrote:
On Tue, May 23, 2023 at 10:26 AM Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote:
On 5/23/23 10:13, syzbot wrote:
Hello,

syzbot tried to test the proposed patch but the build/boot failed:

failed to apply patch:
checking file drivers/infiniband/sw/rxe/rxe_qp.c
patch: **** unexpected end of file in patch
This is not the root cause. The fix is not good.
This problem is about "INFO: trying to register non-static key. The
code is fine but needs lockdep annotation, or maybe"
This warning is from "lock is not initialized". This is a
use-before-initialized problem.
The correct fix is to initialize the lock that is complained before it is used.

Zhu Yanjun
Based on the call trace, the followings are the order of this call trace.

291 /* called by the create qp verb */
292 int rxe_qp_from_init(struct rxe_dev *rxe, struct rxe_qp *qp,
struct rxe_pd *pd,
297 {
...
317 rxe_qp_init_misc(rxe, qp, init);
...
322
323 err = rxe_qp_init_resp(rxe, qp, init, udata, uresp);
324 if (err)
325 goto err2; <--- error

...

334 err2:
335 rxe_queue_cleanup(qp->sq.queue); <--- Goto here
336 qp->sq.queue = NULL;

In rxe_qp_init_resp, the error occurs before skb_queue_head_init.
So this call trace appeared.
250 static int rxe_qp_init_resp(struct rxe_dev *rxe, struct rxe_qp *qp,
254 {
...
264
265 type = QUEUE_TYPE_FROM_CLIENT;
266 qp->rq.queue = rxe_queue_init(rxe, &qp->rq.max_wr,
267 wqe_size, type);
268 if (!qp->rq.queue)
269 return -ENOMEM; <---Error here
270

...

282 skb_queue_head_init(&qp->resp_pkts); <-this is not called.
...
This will make spin_lock of resp_pkts is used before initialized.
IMHO, the above is same as

Which is caused by "skb_queue_head_init(&qp->resp_pkts)" is not called
given rxe_qp_init_resp returns error, but the cleanup still trigger the
chain.

rxe_qp_do_cleanup -> rxe_completer -> drain_resp_pkts ->
skb_dequeue(&qp->resp_pkts)
my previous analysis. If not, could you provide another better way to
fix it?
Move the initialization to the beginning. This can fix this problem.
See below:

"
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c
b/drivers/infiniband/sw/rxe/rxe_qp.c
index c5451a4488ca..22ef6188d7b1 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -176,6 +176,9 @@ static void rxe_qp_init_misc(struct rxe_dev *rxe,
struct rxe_qp *qp,
spin_lock_init(&qp->rq.producer_lock);
spin_lock_init(&qp->rq.consumer_lock);

+ skb_queue_head_init(&qp->req_pkts);
+ skb_queue_head_init(&qp->resp_pkts);
+
atomic_set(&qp->ssn, 0);
atomic_set(&qp->skb_out, 0);
}
@@ -234,8 +237,6 @@ static int rxe_qp_init_req(struct rxe_dev *rxe,
struct rxe_qp *qp,
qp->req.opcode = -1;
qp->comp.opcode = -1;

- skb_queue_head_init(&qp->req_pkts);
-
rxe_init_task(&qp->req.task, qp, rxe_requester);
rxe_init_task(&qp->comp.task, qp, rxe_completer);

@@ -279,8 +280,6 @@ static int rxe_qp_init_resp(struct rxe_dev *rxe,
struct rxe_qp *qp,
}
}

- skb_queue_head_init(&qp->resp_pkts);
-
rxe_init_task(&qp->resp.task, qp, rxe_responder);

qp->resp.opcode = OPCODE_NONE;
"

It is weird to me that init them in init_misc instead of init_req/resp, given they
are dedicated/used for the different purpose. But just my 0.02$.

Guoqing