Re: Question about reserved_regions w/ Intel IOMMU

From: Robin Murphy
Date: Thu Jun 08 2023 - 11:28:50 EST


On 2023-06-08 04:03, Baolu Lu wrote:
On 6/8/23 7:03 AM, Alexander Duyck wrote:
On Wed, Jun 7, 2023 at 3:40 PM Alexander Duyck
<alexander.duyck@xxxxxxxxx> wrote:

I am running into a DMA issue that appears to be a conflict between
ACS and IOMMU. As per the documentation I can find, the IOMMU is
supposed to create reserved regions for MSI and the memory window
behind the root port. However looking at reserved_regions I am not
seeing that. I only see the reservation for the MSI.

So for example with an enabled NIC and iommu enabled w/o passthru I am seeing:
# cat /sys/bus/pci/devices/0000\:83\:00.0/iommu_group/reserved_regions
0x00000000fee00000 0x00000000feefffff msi

Shouldn't there also be a memory window for the region behind the root
port to prevent any possible peer-to-peer access?

Since the iommu portion of the email bounced I figured I would fix
that and provide some additional info.

I added some instrumentation to the kernel to dump the resources found
in iova_reserve_pci_windows. From what I can tell it is finding the
correct resources for the Memory and Prefetchable regions behind the
root port. It seems to be calling reserve_iova which is successfully
allocating an iova to reserve the region.

However still no luck on why it isn't showing up in reserved_regions.

Perhaps I can ask the opposite question, why it should show up in
reserve_regions? Why does the iommu subsystem block any possible peer-
to-peer DMA access? Isn't that a decision of the device driver.

The iova_reserve_pci_windows() you've seen is for kernel DMA interfaces
which is not related to peer-to-peer accesses.

Right, in general the IOMMU driver cannot be held responsible for whatever might happen upstream of the IOMMU input. The DMA layer carves PCI windows out of its IOVA space unconditionally because we know that they *might* be problematic, and we don't have any specific constraints on our IOVA layout so it's no big deal to just sacrifice some space for simplicity. We don't want to have to go digging any further into bus-specific code to reason about whether the right ACS capabilities are present and enabled everywhere to prevent direct P2P or not. Other use-cases may have different requirements, though, so it's up to them what they want to do.

It's conceptually pretty much the same as the case where the device (or indeed a PCI host bridge or other interconnect segment in-between) has a constrained DMA address width - the device may not be able to access all of the address space that the IOMMU provides, but the IOMMU itself can't tell you that.

Thanks,
Robin.