Re: [RFC] /dev/ioasid uAPI proposal

From: Jason Gunthorpe
Date: Fri Jun 04 2021 - 08:28:35 EST


On Fri, Jun 04, 2021 at 08:38:26AM +0000, Tian, Kevin wrote:
> > I think more to drive the replacement design; if we can't figure out
> > how to do something other than backwards compatibility trickery in the
> > kernel, it's probably going to bite us. Thanks,
>
> I'm a bit lost on the desired flow in your minds. Here is one flow based
> on my understanding of this discussion. Please comment whether it
> matches your thinking:
>
> 0) ioasid_fd is created and registered to KVM via KVM_ADD_IOASID_FD;
>
> 1) Qemu binds dev1 to ioasid_fd;
>
> 2) Qemu calls IOASID_GET_DEV_INFO for dev1. This will carry IOMMU_
> CACHE info i.e. whether underlying IOMMU can enforce snoop;
>
> 3) Qemu plans to create a gpa_ioasid, and attach dev1 to it. Here Qemu
> needs to figure out whether dev1 wants to do no-snoop. This might
> be based a fixed vendor/class list or specified by user;
>
> 4) gpa_ioasid = ioctl(ioasid_fd, IOASID_ALLOC); At this point a 'snoop'
> flag is specified to decide the page table format, which is supposed
> to match dev1;

> 5) Qemu attaches dev1 to gpa_ioasid via VFIO_ATTACH_IOASID. At this
> point, specify snoop/no-snoop again. If not supported by related
> iommu or different from what gpa_ioasid has, attach fails.

Why do we need to specify it again?

If the IOASID was created with the "block no-snoop" flag then it is
blocked in that IOASID, and that blocking sets the page table format.

The only question is if we can successfully attach a device to the
page table, or not.

The KVM interface is a bit tricky because Alex said this is partially
security, wbinvd is only enabled if someone has a FD to a device that
can support no-snoop.

Personally I think this got way too complicated, the KVM interface
should simply be

ioctl(KVM_ALLOW_INCOHERENT_DMA, ioasidfd, device_label)
ioctl(KVM_DISALLOW_INCOHERENT_DMA, ioasidfd, device_label)

and let qemu sort it out based on command flags, detection, whatever.

'ioasidfd, device_label' is the security proof that Alex asked
for. This needs to be some device in the ioasidfd that declares it is
capabale of no-snoop. Eg vfio_pci would always declare it is capable
of no-snoop.

No kernel call backs, no kernel auto-sync/etc. If qemu mismatches the
IOASID block no-snoop flag with the KVM_x_INCOHERENT_DMA state then it
is just a kernel-harmless uerspace bug.

Then user space can decide which of the various axis's it wants to
optimize for.

Jason