Re: [PATCH v2 13/16] xen-blkback: Implement diskseq checks

From: Roger Pau Monné
Date: Thu Jun 08 2023 - 04:29:34 EST


On Wed, Jun 07, 2023 at 12:14:46PM -0400, Demi Marie Obenour wrote:
> On Wed, Jun 07, 2023 at 10:20:08AM +0200, Roger Pau Monné wrote:
> > On Tue, Jun 06, 2023 at 01:01:20PM -0400, Demi Marie Obenour wrote:
> > > On Tue, Jun 06, 2023 at 10:25:47AM +0200, Roger Pau Monné wrote:
> > > > On Tue, May 30, 2023 at 04:31:13PM -0400, Demi Marie Obenour wrote:
> > > > > - if (be->major | be->minor) {
> > > > > - if (be->major != major || be->minor != minor)
> > > > > - pr_warn("changing physical device (from %x:%x to %x:%x) not supported.\n",
> > > > > - be->major, be->minor, major, minor);
> > > > > + diskseq_str = xenbus_read(XBT_NIL, dev->nodename, "diskseq", &diskseq_len);
> > > > > + if (IS_ERR(diskseq_str)) {
> > > > > + int err = PTR_ERR(diskseq_str);
> > > > > + diskseq_str = NULL;
> > > > > +
> > > > > + /*
> > > > > + * If this does not exist, it means legacy userspace that does not
> > > > > + * support diskseq.
> > > > > + */
> > > > > + if (unlikely(!XENBUS_EXIST_ERR(err))) {
> > > > > + xenbus_dev_fatal(dev, err, "reading diskseq");
> > > > > + return;
> > > > > + }
> > > > > + diskseq = 0;
> > > > > + } else if (diskseq_len <= 0) {
> > > > > + xenbus_dev_fatal(dev, -EFAULT, "diskseq must not be empty");
> > > > > + goto fail;
> > > > > + } else if (diskseq_len > 16) {
> > > > > + xenbus_dev_fatal(dev, -ERANGE, "diskseq too long: got %d but limit is 16",
> > > > > + diskseq_len);
> > > > > + goto fail;
> > > > > + } else if (diskseq_str[0] == '0') {
> > > > > + xenbus_dev_fatal(dev, -ERANGE, "diskseq must not start with '0'");
> > > > > + goto fail;
> > > > > + } else {
> > > > > + char *diskseq_end;
> > > > > + diskseq = simple_strtoull(diskseq_str, &diskseq_end, 16);
> > > > > + if (diskseq_end != diskseq_str + diskseq_len) {
> > > > > + xenbus_dev_fatal(dev, -EINVAL, "invalid diskseq");
> > > > > + goto fail;
> > > > > + }
> > > > > + kfree(diskseq_str);
> > > > > + diskseq_str = NULL;
> > > > > + }
> > > >
> > > > Won't it be simpler to use xenbus_scanf() with %llx formatter?
> > >
> > > xenbus_scanf() doesn’t check for overflow and accepts lots of junk it
> > > really should not. Should this be fixed in xenbus_scanf()?
> >
> > That would be my preference, so that you can use it here instead of
> > kind of open-coding it.
>
> This winds up being a much more invasive patch as it requires changing
> sscanf(). It also has a risk (probably mostly theoretical) of breaking
> buggy userspace that passes garbage values here.

Well, if the current function is not suitable for your purposes it
would be better to fix it rather than open-code what you need. Mostly
because further usages would then also need to open-code whatever
required.

> > > > Also, you tie this logic to the "physical-device" watch, which
> > > > strictly implies that the "diskseq" node must be written to xenstore
> > > > before the "physical-device" node. This seems fragile, but I don't
> > > > see much better optiono since the "diskseq" is optional.
> > >
> > > What about including the diskseq in the "physical-device" node? Perhaps
> > > use diskseq@major:minor syntax?
> >
> > Hm, how would you know whether the blkback instance in the kernel
> > supports the diskseq syntax in physical-device?
>
> That’s what the next patch is for 🙂.

Hm, I think we should separate diskseq support from the notify open
stuff: it's possible a different (non-Linux) backend wants to
implement open notify support but doesn't have diskseq.

> > Can you fetch a disk using a diskseq identifier?
>
> Not yet, although I have considered adding this ability. It would be
> one step towards a “diskseqfs” that userspace could use to open a device
> by diskseq.
>
> > Why I understand that this is an extra safety check in order to assert
> > blkback is opening the intended device, is this attempting to fix some
> > existing issue?
>
> Yes, it is. I have a block script (written in C) that validates the
> device it has opened before passing the information to blkback. It uses
> the diskseq to do this, but for that protection to be complete, blkback
> must also be aware of it.

But if your block script opens the device, and keeps it open until
blkback has also taken a reference to it, there's no way such device
could be removed and recreated in the window you point out above, as
there's always a reference on it taken?

> > I'm not sure I see how the major:minor numbers would point to a
> > different device than the one specified by the toolstack unless the
> > admin explicitly messes with the devices before blkback has got time
> > to open them. But then the admin can already do pretty much
> > everything it wants with the system.
>
> Admins typically refer to e.g. device-mapper devices by name, not by
> major:minor number. If a device is destroyed and recreated right as the
> block script is running, this race condition can occur.

Right, but what about this device recreation happening after the admin
has written the guest config file but before the call to (lib)xl
happens? blkback would also end up using a different device than
indented, and your proposed approach doesn't fix this. The only way to
solve this would be to reference devices by UUID (iow: diskseq)
directly in the guest config file.

Then the block script will open the device by diskseq and pass the
major:minor numbers to blkback.

Thanks, Roger.