Re: [syzbot] unregister_netdevice: waiting for DEV to become free (7)

From: Jason Gunthorpe
Date: Mon Nov 21 2022 - 21:13:21 EST


On Fri, Nov 18, 2022 at 02:28:53PM +0100, Dmitry Vyukov wrote:
> On Fri, 18 Nov 2022 at 12:39, syzbot
> <syzbot+5e70d01ee8985ae62a3b@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 9c8774e629a1 net: eql: Use kzalloc instead of kmalloc/memset
> > git tree: net-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=17bf6cc8f00000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=9eb259db6b1893cf
> > dashboard link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b
> > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1136d592f00000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1193ae64f00000
> >
> > Bisection is inconclusive: the issue happens on the oldest tested release.
> >
> > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=167c33a2f00000
> > final oops: https://syzkaller.appspot.com/x/report.txt?x=157c33a2f00000
> > console output: https://syzkaller.appspot.com/x/log.txt?x=117c33a2f00000
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+5e70d01ee8985ae62a3b@xxxxxxxxxxxxxxxxxxxxxxxxx
> >
> > iwpm_register_pid: Unable to send a nlmsg (client = 2)
> > infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98
> > unregister_netdevice: waiting for vlan0 to become free. Usage count = 2
>
> +RDMA maintainers
>
> There are 4 reproducers and all contain:
>
> r0 = socket$nl_rdma(0x10, 0x3, 0x14)
> sendmsg$RDMA_NLDEV_CMD_NEWLINK(...)
>
> Also the preceding print looks related (a bug in the error handling
> path there?):
>
> infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98

I'm pretty sure it is an rxe bug

ib_device_set_netdev() will hold the netdev until the caller destroys
the ib_device

rxe calls it during rxe_register_device() because the user asked for a
stacked ib_device on top of the netdev

Presumably rxe needs to have a notifier to also self destroy the rxe
device if the underlying net device is to be destroyed?

Can someone from rxe check into this?

Jason