Re: [PATCH v2 13/16] xen-blkback: Implement diskseq checks

From: Demi Marie Obenour
Date: Fri Jun 09 2023 - 12:55:56 EST


On Fri, Jun 09, 2023 at 05:13:45PM +0200, Roger Pau Monné wrote:
> On Thu, Jun 08, 2023 at 11:33:26AM -0400, Demi Marie Obenour wrote:
> > On Thu, Jun 08, 2023 at 10:29:18AM +0200, Roger Pau Monné wrote:
> > > On Wed, Jun 07, 2023 at 12:14:46PM -0400, Demi Marie Obenour wrote:
> > > > On Wed, Jun 07, 2023 at 10:20:08AM +0200, Roger Pau Monné wrote:
> > > > > On Tue, Jun 06, 2023 at 01:01:20PM -0400, Demi Marie Obenour wrote:
> > > > > > On Tue, Jun 06, 2023 at 10:25:47AM +0200, Roger Pau Monné wrote:
> > > > > > > On Tue, May 30, 2023 at 04:31:13PM -0400, Demi Marie Obenour wrote:
> > > > > > > Also, you tie this logic to the "physical-device" watch, which
> > > > > > > strictly implies that the "diskseq" node must be written to xenstore
> > > > > > > before the "physical-device" node. This seems fragile, but I don't
> > > > > > > see much better optiono since the "diskseq" is optional.
> > > > > >
> > > > > > What about including the diskseq in the "physical-device" node? Perhaps
> > > > > > use diskseq@major:minor syntax?
> > > > >
> > > > > Hm, how would you know whether the blkback instance in the kernel
> > > > > supports the diskseq syntax in physical-device?
> > > >
> > > > That’s what the next patch is for 🙂.
> > >
> > > Hm, I think we should separate diskseq support from the notify open
> > > stuff: it's possible a different (non-Linux) backend wants to
> > > implement open notify support but doesn't have diskseq.
> >
> > I like this idea! What about having blkback set diskseq to zero?
> > Userspace could then replace it with the actual value.
>
> I think it would be better if we used a sysfs node, because blkfront
> has no business is knowing whether diskseq is supported by the
> backend or not.
>
> xenstore should be reserved to features exposed between blkfront and
> blkback if possible. I know we haven't been very good at this
> however.
>
> > > > > Can you fetch a disk using a diskseq identifier?
> > > >
> > > > Not yet, although I have considered adding this ability. It would be
> > > > one step towards a “diskseqfs” that userspace could use to open a device
> > > > by diskseq.
> > > >
> > > > > Why I understand that this is an extra safety check in order to assert
> > > > > blkback is opening the intended device, is this attempting to fix some
> > > > > existing issue?
> > > >
> > > > Yes, it is. I have a block script (written in C) that validates the
> > > > device it has opened before passing the information to blkback. It uses
> > > > the diskseq to do this, but for that protection to be complete, blkback
> > > > must also be aware of it.
> > >
> > > But if your block script opens the device, and keeps it open until
> > > blkback has also taken a reference to it, there's no way such device
> > > could be removed and recreated in the window you point out above, as
> > > there's always a reference on it taken?
> >
> > This assumes that the block script is not killed in the meantime,
> > which is not a safe assumption due to timeouts and the OOM killer.
>
> Doesn't seem very reliable to use with delete-on-close either then.

That’s actually the purpose of delete-on-close! It ensures that if the
block script gets killed, the device is automatically cleaned up.

> > > > > I'm not sure I see how the major:minor numbers would point to a
> > > > > different device than the one specified by the toolstack unless the
> > > > > admin explicitly messes with the devices before blkback has got time
> > > > > to open them. But then the admin can already do pretty much
> > > > > everything it wants with the system.
> > > >
> > > > Admins typically refer to e.g. device-mapper devices by name, not by
> > > > major:minor number. If a device is destroyed and recreated right as the
> > > > block script is running, this race condition can occur.
> > >
> > > Right, but what about this device recreation happening after the admin
> > > has written the guest config file but before the call to (lib)xl
> > > happens? blkback would also end up using a different device than
> > > indented, and your proposed approach doesn't fix this. The only way to
> > > solve this would be to reference devices by UUID (iow: diskseq)
> > > directly in the guest config file.
> >
> > That would be a good idea, but it is orthogonal to this patch. My
> > script opens the device and uses various means to check that it did
> > open the correct device. It then passes the diskseq to blkback.
>
> How you do this with losetup? I guess there's an atomic way to setup
> a loop device and get its diskseq?

It can’t be done with losetup. I use a C program that directly
issues ioctls to /dev/loop-control and /dev/loop*. Doing this with
device-mapper requires kernel patches that have been submitted but are
not yet upstream.

> > > Then the block script will open the device by diskseq and pass the
> > > major:minor numbers to blkback.
> >
> > Alternatively, the toolstack could write both the diskseq and
> > major:minor numbers and be confident that it is referring to the
> > correct device, no matter how long ago it got that information.
> > This could be quite useful for e.g. one VM exporting a device to
> > another VM by calling losetup(8) and expecting a human to make a
> > decision based on various properties about the device. In this
> > case there is no upper bound on the race window.
>
> Instead of playing with xenstore nodes, it might be better to simply
> have blkback export on sysfs the diskseq of the opened device, and let
> the block script check whether that's correct or not. That implies
> less code in the kernel side, and doesn't pollute xenstore.

This would require that blkback delay exposing the device to the
frontend until the block script has checked that the diskseq is correct.
Much simpler for the block script to provide the diskseq in xenstore.
If you want to avoid an extra xenstore node, I can make the diskseq part
of the physical-device node.
--
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature