Re: INFO: task hung in nbd_ioctl

From: Eric Biggers
Date: Thu Oct 17 2019 - 12:36:39 EST


On Thu, Oct 17, 2019 at 05:28:29PM +0100, Richard W.M. Jones wrote:
> On Thu, Oct 17, 2019 at 10:47:59AM -0500, Mike Christie wrote:
> > On 10/17/2019 09:03 AM, Richard W.M. Jones wrote:
> > > On Tue, Oct 01, 2019 at 04:19:25PM -0500, Mike Christie wrote:
> > >> Hey Josef and nbd list,
> > >>
> > >> I had a question about if there are any socket family restrictions for nbd?
> > >
> > > In normal circumstances, in userspace, the NBD protocol would only be
> > > used over AF_UNIX or AF_INET/AF_INET6.
> > >
> > > There's a bit of confusion because netlink is used by nbd-client to
> > > configure the NBD device, setting things like block size and timeouts
> > > (instead of ioctl which is deprecated). I think you don't mean this
> > > use of netlink?
> >
> > I didn't. It looks like it is just a bad test.
> >
> > For the automated test in this thread the test created a AF_NETLINK
> > socket and passed it into the NBD_SET_SOCK ioctl. That is what got used
> > for the NBD_DO_IT ioctl.
> >
> > I was not sure if the test creator picked any old socket and it just
> > happened to pick one nbd never supported, or it was trying to simulate
> > sockets that did not support the shutdown method.
> >
> > I attached the automated test that got run (test.c).
>
> I'd say it sounds like a bad test, but I'm not familiar with syzkaller
> nor how / from where it generates these tests. Did someone report a
> bug and then syzkaller wrote this test?
>
> Rich.
>
> > >
> > >> The bug here is that some socket familys do not support the
> > >> sock->ops->shutdown callout, and when nbd calls kernel_sock_shutdown
> > >> their callout returns -EOPNOTSUPP. That then leaves recv_work stuck in
> > >> nbd_read_stat -> sock_xmit -> sock_recvmsg. My patch added a
> > >> flush_workqueue call, so for socket familys like AF_NETLINK in this bug
> > >> we hang like we see below.
> > >>
> > >> I can just remove the flush_workqueue call in that code path since it's
> > >> not needed there, but it leaves the original bug my patch was hitting
> > >> where we leave the recv_work running which can then result in leaked
> > >> resources, or possible use after free crashes and you still get the hang
> > >> if you remove the module.
> > >>
> > >> It looks like we have used kernel_sock_shutdown for a while so I thought
> > >> we might never have supported sockets that did not support the callout.
> > >> Is that correct? If so then I can just add a check for this in
> > >> nbd_add_socket and fix that bug too.
> > >
> > > Rich.
> > >

It's an automatically generated fuzz test.

There's rarely any such thing as a "bad" fuzz test. If userspace can do
something that causes the kernel to crash or hang, it's a kernel bug, with very
few exceptions (e.g. like writing to /dev/mem).

If there are cases that aren't supported, like sockets that don't support a
certain function or whatever, then the code needs to check for those cases and
return an error, not hang the kernel.

- Eric