Re: [Nbd] [PATCH] NBD: replace kill_bdev() with __invalidate_device()

From: Wouter Verhelst
Date: Tue May 24 2016 - 11:17:41 EST


On Thu, May 19, 2016 at 08:35:03AM +0200, Markus Pargmann wrote:
> Hi Wouter,
>
> On Sun, May 15, 2016 at 02:55:39PM +0200, Wouter Verhelst wrote:
> > Hi Markus,
> >
> > On Thu, May 12, 2016 at 11:53:01AM +0200, Markus Pargmann wrote:
> > > On Thursday 28 April 2016 18:27:34 Wouter Verhelst wrote:
> > > > However, at some point I agreed with Paul (your predecessor) that when
> > > > this happens due to an error condition (as opposed to it being due to an
> > > > explicit disconnect), the kernel would block all reads from or writes to
> > > > the device, and the client may try to reconnect *from the same
> > > > PID* (i.e., it may not fork()). If that succeeds, the next NBD_DO_IT is
> > > > assumed to be connected to the same server; if instead the process
> > > > exits, then the block device is assumed to be dead, will be reset, and
> > > > all pending reads or writes would error.
> > > >
> > > > In principle, this allows for a proper reconnect from userspace if it
> > > > can be done. However, I'm not sure whether this ever worked well or
> > > > whether it was documented, so it's probably fine if you think it should
> > > > be replaced with something else.
> > >
> > > At least I was not aware of this possibility. As far as I know the
> > > previous code even had issues with the signals used to kill on timeouts
> > > which also killed the userspace program sometimes.
> > >
> > > Currently I can't see a code path that supports reconnects. But I may
> > > have removed that accidently in the past.
> >
> > Right. Like I said, I'm not sure if it ever worked well. The user space
> > client has a -persist option that tries to implement it, but I've been
> > getting some bug reports from people who've tried it (although that may
> > have been my fault rather than the kernel's).
> >
> > > > (obviously, userspace reconnecting the device to a different device is
> > > > wrong and should not be done, but that's a case of "if you break it, you
> > > > get to keep both pieces)
> > > >
> > > > At any rate, I think it makes sense for userspace to be given a chance
> > > > to *attempt* to reconnect a device when the connection drops
> > > > unexpectedly.
> > >
> > > Perhaps it would be better to setup the kernel driver explicitly for
> > > that. Perhaps some flag to let the kernel driver know that the client
> > > would like to keep the block device open? In that case the client could
> > > excplicitly use NBD_CLEAR_SOCK to cleanup everything.
> >
> > I'm not sure what you mean by this. Can you clarify?
>
> I meant that it might be better to have a separate way for NBD_DO_IT.
> Something where the client software can directly instruct the kernel to
> keep everything opened in case of an error so that the client may
> reconnect afterwards.
>
> This could be a new ioctl that sets it up, for example 'NBD_PERSISTENT'.

If we're going to add a new ioctl, I think it might be useful to have a
second "flags" type ioctl; one where the client can set options, and
where the return value of the ioctl indicates which options the kernel
understands and will/can honour.

This could then also be used for things like checking whether the kernel
supports structured writes, etc.

> The NBD_DO_IT afterwards would keep everything up and running in case of
> a connection error so that the client could set a new socket using
> NBD_SET_SOCK and reenter using NBD_DO_IT.
>
> For all clients that are not capable of this mechanism or don't use it,
> NBD_DO_IT would clean up everything properly on any error.

Right.

> > > Or perhaps a completely new ioctl that can transmit back some more
> > > information about what failures were seen and whether the blockdevice
> > > was closed or not?
> >
> > The intent was that ioctl(NBD_DO_IT) would return an error when the
> > disconnect was not requested, and would return 0 when the connection
> > dropped due to userspace doing ioctl(NBD_DISCONNECT), since dropping the
> > connection when userspace explicitly asks for it is not an error.
> >
> > drivers/block/nbd.c contains the following:
> >
> > static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
> > unsigned int cmd, unsigned long arg)
> > {
> > [...]
> > case NBD_DO_IT: {
> > [...]
> > if (nbd->disconnect) /* user requested, ignore socket errors */
> > return 0;
> > return error;
> > }
> > [...]
> >
> > so the signalling part of it is at least still there. Whether it works,
> > I haven't tested.
>
> I just looked up the kernel code from 4.0. This code was there as
> well. But the socket and blockdevice were both destroyed before leaving
> the NBD_DO_IT ioctl. So it seems to have never been really persistent.
> Filesystems would have still been killed.
>
> So for a persistent nbd device there is some more code necessary to do
> it.

Okay.

--
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
people in the world who think they really understand all of its rules,
and pretty much all of them are just lying to themselves too.
-- #debian-devel, OFTC, 2016-02-12