Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

From: David Gibson
Date: Tue Apr 27 2021 - 01:15:16 EST


On Fri, Apr 23, 2021 at 10:31:46AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > Sent: Friday, April 23, 2021 7:40 AM
> >
> > On Thu, Apr 22, 2021 at 04:38:08PM -0600, Alex Williamson wrote:
> >
> > > Because it's fundamental to the isolation of the device? What you're
> > > proposing doesn't get around the group issue, it just makes it implicit
> > > rather than explicit in the uapi.
> >
> > I'm not even sure it makes it explicit or implicit, it just takes away
> > the FD.
> >
> > There are four group IOCTLs, I see them mapping to /dev/ioasid follows:
> > VFIO_GROUP_GET_STATUS -
> > + VFIO_GROUP_FLAGS_CONTAINER_SET is fairly redundant
> > + VFIO_GROUP_FLAGS_VIABLE could be in a new sysfs under
> > kernel/iomm_groups, or could be an IOCTL on /dev/ioasid
> > IOASID_ALL_DEVICES_VIABLE
> >
> > VFIO_GROUP_SET_CONTAINER -
> > + This happens implicitly when the device joins the IOASID
> > so it gets moved to the vfio_device FD:
> > ioctl(vifo_device_fd, JOIN_IOASID_FD, ioasifd)
> >
> > VFIO_GROUP_UNSET_CONTAINER -
> > + Also moved to the vfio_device FD, opposite of JOIN_IOASID_FD
> >
> > VFIO_GROUP_GET_DEVICE_FD -
> > + Replaced by opening /dev/vfio/deviceX
> > Learn the deviceX which will be the cdev sysfs shows as:
> > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/vfio/deviceX/dev
> > Open /dev/vfio/deviceX
> >
> > > > How do we model the VFIO group security concept to something like
> > > > VDPA?
> > >
> > > Is it really a "VFIO group security concept"? We're reflecting the
> > > reality of the hardware, not all devices are fully isolated.
> >
> > Well, exactly.
> >
> > /dev/ioasid should understand the group concept somehow, otherwise it
> > is incomplete and maybe even security broken.
> >
> > So, how do I add groups to, say, VDPA in a way that makes sense? The
> > only answer I come to is broadly what I outlined here - make
> > /dev/ioasid do all the group operations, and do them when we enjoin
> > the VDPA device to the ioasid.
> >
> > Once I have solved all the groups problems with the non-VFIO users,
> > then where does that leave VFIO? Why does VFIO need a group FD if
> > everyone else doesn't?
> >
> > > IOMMU group. This is the reality that any userspace driver needs to
> > > play in, it doesn't magically go away because we drop the group file
> > > descriptor.
> >
> > I'm not saying it does, I'm saying it makes the uAPI more regular and
> > easier to fit into /dev/ioasid without the group FD.
> >
> > > It only makes the uapi more difficult to use correctly because
> > > userspace drivers need to go outside of the uapi to have any idea
> > > that this restriction exists.
> >
> > I don't think it makes any substantive difference one way or the
> > other.
> >
> > With the group FD: the userspace has to read sysfs, find the list of
> > devices in the group, open the group fd, create device FDs for each
> > device using the name from sysfs.
> >
> > Starting from a BDF the general pseudo code is
> > group_path = readlink("/sys/bus/pci/devices/BDF/iommu_group")
> > group_name = basename(group_path)
> > group_fd = open("/dev/vfio/"+group_name)
> > device_fd = ioctl(VFIO_GROUP_GET_DEVICE_FD, BDF);
> >
> > Without the group FD: the userspace has to read sysfs, find the list
> > of devices in the group and then open the device-specific cdev (found
> > via sysfs) and link them to a /dev/ioasid FD.
> >
> > Starting from a BDF the general pseudo code is:
> > device_name = first_directory_of("/sys/bus/pci/devices/BDF/vfio/")
> > device_fd = open("/dev/vfio/"+device_name)
> > ioasidfd = open("/dev/ioasid")
> > ioctl(device_fd, JOIN_IOASID_FD, ioasidfd)
> >
> > These two routes can have identical outcomes and identical security
> > checks.
> >
> > In both cases if userspace wants a list of BDFs in the same group as
> > the BDF it is interested in:
> > readdir("/sys/bus/pci/devices/BDF/iommu_group/devices")
> >
> > It seems like a very small difference to me.
> >
> > I still don't see how the group restriction gets surfaced to the
> > application through the group FD. The applications I looked through
> > just treat the group FD as a step on their way to get the device_fd.
> >
>
> So your proposal sort of moves the entire container/group/domain
> managment into /dev/ioasid and then leaves vfio only provide device
> specific uAPI. An ioasid represents a page table (address space), thus
> is equivalent to the scope of VFIO container.

Right. I don't really know how /dev/iosasid is supposed to work, and
so far I don't see how it conceptually differs from a container. What
is it adding?

> Having the device join
> an ioasid is equivalent to attaching a device to VFIO container, and
> here the group integrity must be enforced. Then /dev/ioasid anyway
> needs to manage group objects and their association with ioasid and
> underlying iommu domain thus it's pointless to keep same logic within
> VFIO. Is this understanding correct?
>
> btw one remaining open is whether you expect /dev/ioasid to be
> associated with a single iommu domain, or multiple. If only a single
> domain is allowed, the ioasid_fd is equivalent to the scope of VFIO
> container. It is supposed to have only one gpa_ioasid_id since one
> iommu domain can only have a single 2nd level pgtable. Then all other
> ioasids, once allocated, must be nested on this gpa_ioasid_id to fit
> in the same domain. if a legacy vIOMMU is exposed (which disallows
> nesting), the userspace has to open an ioasid_fd for every group.
> This is basically the VFIO way. On the other hand if multiple domains
> is allowed, there could be multiple ioasid_ids each holding a 2nd level
> pgtable and an iommu domain (or a list of pgtables and domains due to
> incompatibility issue as discussed in another thread), and can be
> nested by other ioasids respectively. The application only needs
> to open /dev/ioasid once regardless of whether vIOMMU allows
> nesting, and has a single interface for ioasid allocation. Which way
> do you prefer to?
>
> Thanks
> Kevin
>

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature