Re: [syzbot] unregister_netdevice: waiting for DEV to become free (7)

From: wangyufen
Date: Mon Nov 21 2022 - 22:28:38 EST




在 2022/11/22 10:13, Jason Gunthorpe 写道:
On Fri, Nov 18, 2022 at 02:28:53PM +0100, Dmitry Vyukov wrote:
On Fri, 18 Nov 2022 at 12:39, syzbot
<syzbot+5e70d01ee8985ae62a3b@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

Hello,

syzbot found the following issue on:

HEAD commit: 9c8774e629a1 net: eql: Use kzalloc instead of kmalloc/memset
git tree: net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=17bf6cc8f00000
kernel config: https://syzkaller.appspot.com/x/.config?x=9eb259db6b1893cf
dashboard link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1136d592f00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1193ae64f00000

Bisection is inconclusive: the issue happens on the oldest tested release.

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=167c33a2f00000
final oops: https://syzkaller.appspot.com/x/report.txt?x=157c33a2f00000
console output: https://syzkaller.appspot.com/x/log.txt?x=117c33a2f00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+5e70d01ee8985ae62a3b@xxxxxxxxxxxxxxxxxxxxxxxxx

iwpm_register_pid: Unable to send a nlmsg (client = 2)
infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98
unregister_netdevice: waiting for vlan0 to become free. Usage count = 2

+RDMA maintainers

There are 4 reproducers and all contain:

r0 = socket$nl_rdma(0x10, 0x3, 0x14)
sendmsg$RDMA_NLDEV_CMD_NEWLINK(...)

Also the preceding print looks related (a bug in the error handling
path there?):

infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98

I'm pretty sure it is an rxe bug

ib_device_set_netdev() will hold the netdev until the caller destroys
the ib_device

rxe calls it during rxe_register_device() because the user asked for a
stacked ib_device on top of the netdev

Presumably rxe needs to have a notifier to also self destroy the rxe
device if the underlying net device is to be destroyed?

Can someone from rxe check into this?

The following patch may fix the issue:

--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -4049,6 +4049,9 @@ int rdma_listen(struct rdma_cm_id *id, int backlog)
return 0;
err:
id_priv->backlog = 0;
+ if (id_priv->cma_dev)
+ cma_release_dev(id_priv);
+
/*
* All the failure paths that lead here will not allow the req_handler's
* to have run.



The causes are as follows:

rdma_listen()
rdma_bind_addr()
cma_acquire_dev_by_src_ip()
cma_attach_to_dev()
_cma_attach_to_dev()
cma_dev_get()

cma_check_port()
<--The return value is not zero, goto err

err:
<-- The error handling here is missing the operation of cma_release_dev.


Jason