Re: [PATCH v2 13/16] xen-blkback: Implement diskseq checks

From: Demi Marie Obenour
Date: Thu Jun 08 2023 - 11:33:39 EST


On Thu, Jun 08, 2023 at 10:29:18AM +0200, Roger Pau Monné wrote:
> On Wed, Jun 07, 2023 at 12:14:46PM -0400, Demi Marie Obenour wrote:
> > On Wed, Jun 07, 2023 at 10:20:08AM +0200, Roger Pau Monné wrote:
> > > On Tue, Jun 06, 2023 at 01:01:20PM -0400, Demi Marie Obenour wrote:
> > > > On Tue, Jun 06, 2023 at 10:25:47AM +0200, Roger Pau Monné wrote:
> > > > > On Tue, May 30, 2023 at 04:31:13PM -0400, Demi Marie Obenour wrote:
> > > > > > - if (be->major | be->minor) {
> > > > > > - if (be->major != major || be->minor != minor)
> > > > > > - pr_warn("changing physical device (from %x:%x to %x:%x) not supported.\n",
> > > > > > - be->major, be->minor, major, minor);
> > > > > > + diskseq_str = xenbus_read(XBT_NIL, dev->nodename, "diskseq", &diskseq_len);
> > > > > > + if (IS_ERR(diskseq_str)) {
> > > > > > + int err = PTR_ERR(diskseq_str);
> > > > > > + diskseq_str = NULL;
> > > > > > +
> > > > > > + /*
> > > > > > + * If this does not exist, it means legacy userspace that does not
> > > > > > + * support diskseq.
> > > > > > + */
> > > > > > + if (unlikely(!XENBUS_EXIST_ERR(err))) {
> > > > > > + xenbus_dev_fatal(dev, err, "reading diskseq");
> > > > > > + return;
> > > > > > + }
> > > > > > + diskseq = 0;
> > > > > > + } else if (diskseq_len <= 0) {
> > > > > > + xenbus_dev_fatal(dev, -EFAULT, "diskseq must not be empty");
> > > > > > + goto fail;
> > > > > > + } else if (diskseq_len > 16) {
> > > > > > + xenbus_dev_fatal(dev, -ERANGE, "diskseq too long: got %d but limit is 16",
> > > > > > + diskseq_len);
> > > > > > + goto fail;
> > > > > > + } else if (diskseq_str[0] == '0') {
> > > > > > + xenbus_dev_fatal(dev, -ERANGE, "diskseq must not start with '0'");
> > > > > > + goto fail;
> > > > > > + } else {
> > > > > > + char *diskseq_end;
> > > > > > + diskseq = simple_strtoull(diskseq_str, &diskseq_end, 16);
> > > > > > + if (diskseq_end != diskseq_str + diskseq_len) {
> > > > > > + xenbus_dev_fatal(dev, -EINVAL, "invalid diskseq");
> > > > > > + goto fail;
> > > > > > + }
> > > > > > + kfree(diskseq_str);
> > > > > > + diskseq_str = NULL;
> > > > > > + }
> > > > >
> > > > > Won't it be simpler to use xenbus_scanf() with %llx formatter?
> > > >
> > > > xenbus_scanf() doesn’t check for overflow and accepts lots of junk it
> > > > really should not. Should this be fixed in xenbus_scanf()?
> > >
> > > That would be my preference, so that you can use it here instead of
> > > kind of open-coding it.
> >
> > This winds up being a much more invasive patch as it requires changing
> > sscanf(). It also has a risk (probably mostly theoretical) of breaking
> > buggy userspace that passes garbage values here.
>
> Well, if the current function is not suitable for your purposes it
> would be better to fix it rather than open-code what you need. Mostly
> because further usages would then also need to open-code whatever
> required.

That is fair.

> > > > > Also, you tie this logic to the "physical-device" watch, which
> > > > > strictly implies that the "diskseq" node must be written to xenstore
> > > > > before the "physical-device" node. This seems fragile, but I don't
> > > > > see much better optiono since the "diskseq" is optional.
> > > >
> > > > What about including the diskseq in the "physical-device" node? Perhaps
> > > > use diskseq@major:minor syntax?
> > >
> > > Hm, how would you know whether the blkback instance in the kernel
> > > supports the diskseq syntax in physical-device?
> >
> > That’s what the next patch is for 🙂.
>
> Hm, I think we should separate diskseq support from the notify open
> stuff: it's possible a different (non-Linux) backend wants to
> implement open notify support but doesn't have diskseq.

I like this idea! What about having blkback set diskseq to zero?
Userspace could then replace it with the actual value.

> > > Can you fetch a disk using a diskseq identifier?
> >
> > Not yet, although I have considered adding this ability. It would be
> > one step towards a “diskseqfs” that userspace could use to open a device
> > by diskseq.
> >
> > > Why I understand that this is an extra safety check in order to assert
> > > blkback is opening the intended device, is this attempting to fix some
> > > existing issue?
> >
> > Yes, it is. I have a block script (written in C) that validates the
> > device it has opened before passing the information to blkback. It uses
> > the diskseq to do this, but for that protection to be complete, blkback
> > must also be aware of it.
>
> But if your block script opens the device, and keeps it open until
> blkback has also taken a reference to it, there's no way such device
> could be removed and recreated in the window you point out above, as
> there's always a reference on it taken?

This assumes that the block script is not killed in the meantime,
which is not a safe assumption due to timeouts and the OOM killer.

> > > I'm not sure I see how the major:minor numbers would point to a
> > > different device than the one specified by the toolstack unless the
> > > admin explicitly messes with the devices before blkback has got time
> > > to open them. But then the admin can already do pretty much
> > > everything it wants with the system.
> >
> > Admins typically refer to e.g. device-mapper devices by name, not by
> > major:minor number. If a device is destroyed and recreated right as the
> > block script is running, this race condition can occur.
>
> Right, but what about this device recreation happening after the admin
> has written the guest config file but before the call to (lib)xl
> happens? blkback would also end up using a different device than
> indented, and your proposed approach doesn't fix this. The only way to
> solve this would be to reference devices by UUID (iow: diskseq)
> directly in the guest config file.

That would be a good idea, but it is orthogonal to this patch. My
script opens the device and uses various means to check that it did
open the correct device. It then passes the diskseq to blkback.

> Then the block script will open the device by diskseq and pass the
> major:minor numbers to blkback.

Alternatively, the toolstack could write both the diskseq and
major:minor numbers and be confident that it is referring to the
correct device, no matter how long ago it got that information.
This could be quite useful for e.g. one VM exporting a device to
another VM by calling losetup(8) and expecting a human to make a
decision based on various properties about the device. In this
case there is no upper bound on the race window.
--
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature