Re: Question about reserved_regions w/ Intel IOMMU

From: Jason Gunthorpe
Date: Fri Jun 16 2023 - 08:20:54 EST


On Fri, Jun 16, 2023 at 08:39:46AM +0000, Tian, Kevin wrote:
> +Alex
>
> > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > Sent: Tuesday, June 13, 2023 11:54 PM
> >
> > On Thu, Jun 08, 2023 at 04:28:24PM +0100, Robin Murphy wrote:
> >
> > > > The iova_reserve_pci_windows() you've seen is for kernel DMA interfaces
> > > > which is not related to peer-to-peer accesses.
> > >
> > > Right, in general the IOMMU driver cannot be held responsible for
> > whatever
> > > might happen upstream of the IOMMU input.
> >
> > The driver yes, but..
> >
> > > The DMA layer carves PCI windows out of its IOVA space
> > > unconditionally because we know that they *might* be problematic,
> > > and we don't have any specific constraints on our IOVA layout so
> > > it's no big deal to just sacrifice some space for simplicity.
> >
> > This is a problem for everything using UNMANAGED domains. If the iommu
> > API user picks an IOVA it should be able to expect it to work. If the
> > intereconnect fails to allow it to work then this has to be discovered
> > otherwise UNAMANGED domains are not usable at all.
> >
> > Eg vfio and iommufd are also in trouble on these configurations.
> >
>
> If those PCI windows are problematic e.g. due to ACS they belong to
> a single iommu group. If a vfio user opens all the devices in that group
> then it can discover and reserve those windows in its IOVA space.

How? We don't even exclude the single device's BAR if there is no ACS?

> The problem is that the user may not open all the devices then
> currently there is no way for it to know the windows on those
> unopened devices.
>
> Curious why nobody complains about this gap before this thread...

Probably because it only matters if you have a real PCIe switch in the
system, which is pretty rare.

Jason