Re: [RFC v3 00/10] KVM PCIe/MSI passthrough on ARM/ARM64 and IOVA reserved regions

From: Robin Murphy
Date: Thu Dec 08 2016 - 08:14:14 EST


On 08/12/16 09:36, Auger Eric wrote:
> Hi,
>
> On 15/11/2016 14:09, Eric Auger wrote:
>> Following LPC discussions, we now report reserved regions through
>> iommu-group sysfs reserved_regions attribute file.
>
>
> While I am respinning this series into v4, here is a tentative summary
> of technical topics for which no consensus was reached at this point.
>
> 1) Shall we report the usable IOVA range instead of reserved IOVA
> ranges. Not discussed at/after LPC.
> x I currently report reserved regions. Alex expressed the need to
> report the full usable IOVA range instead (x86 min-max range
> minus MSI APIC window). I think this is meaningful for ARM
> too where arm-smmu might not support the full 64b range.
> x Any objection we report the usable IOVA regions instead?

The issue with that is that we can't actually report "the usable
regions" at the moment, as that involves pulling together disjoint
properties of arbitrary hardware unrelated to the IOMMU. We'd be
reporting "the not-definitely-unusable regions, which may have some
unusable holes in them still". That seems like an ABI nightmare - I'd
still much rather say "here are some, but not necessarily all, regions
you definitely can't use", because saying "here are some regions which
you might be able to use most of, probably" is what we're already doing
today, via a single implicit region from 0 to ULONG_MAX ;)

The address space limits are definitely useful to know, but I think it
would be better to expose them separately to avoid the ambiguity. At
worst, I guess it would be reasonable to express the limits via an
"out-of-range" reserved region type for 0 to $base and $top to
ULONG-MAX. To *safely* expose usable regions, we'd have to start out
with a very conservative assumption (e.g. only IOVAs matching physical
RAM), and only expand them once we're sure we can detect every possible
bit of problematic hardware in the system - that's just too limiting to
be useful. And if we expose something knowingly inaccurate, we risk
having another "bogoMIPS in /proc/cpuinfo" ABI burden on our hands, and
nobody wants that...

> 2) Shall the kernel check collision with MSI window* when userspace
> calls VFIO_IOMMU_MAP_DMA?
> Joerg/Will No; Alex yes
> *for IOVA regions consumed downstream to the IOMMU: everyone says NO

If we're starting off by having the SMMU drivers expose it as a fake
fixed region, I don't think we need to worry about this yet. We all seem
to agree that as long as we communicate the fixed regions to userspace,
it's then userspace's job to work around them. Let's come back to this
one once we actually get to the point of dynamically sizing and
allocating 'real' MSI remapping region(s).

Ultimately, the kernel *will* police collisions either way, because an
underlying iommu_map() is going to fail if overlapping IOVAs are ever
actually used, so it's really just a question of whether to have a more
user-friendly failure mode.

> 3) RMRR reporting in the iommu group sysfs? Joerg: yes; Don: no
> My current series does not expose them in iommu group sysfs.
> I understand we can expose the RMRR regions in the iomm group sysfs
> without necessarily supporting RMRR requiring device assignment.
> We can also add this support later.

As you say, reporting them doesn't necessitate allowing device
assignment, and it's information which can already be easily grovelled
out of dmesg (for intel-iommu at least) - there doesn't seem to be any
need to hide them, but the x86 folks can have the final word on that.

Robin.

> Thanks
>
> Eric
>
>
>>
>> Reserved regions are populated through the IOMMU get_resv_region callback
>> (former get_dm_regions), now implemented by amd-iommu, intel-iommu and
>> arm-smmu.
>>
>> The intel-iommu reports the [FEE0_0000h - FEF0_000h] MSI window as an
>> IOMMU_RESV_NOMAP reserved region.
>>
>> arm-smmu reports the MSI window (arbitrarily located at 0x8000000 and
>> 1MB large) and the PCI host bridge windows.
>>
>> The series integrates a not officially posted patch from Robin:
>> "iommu/dma: Allow MSI-only cookies".
>>
>> This series currently does not address IRQ safety assessment.
>>
>> Best Regards
>>
>> Eric
>>
>> Git: complete series available at
>> https://github.com/eauger/linux/tree/v4.9-rc5-reserved-rfc-v3
>>
>> History:
>> RFC v2 -> v3:
>> - switch to an iommu-group sysfs API
>> - use new dummy allocator provided by Robin
>> - dummy allocator initialized by vfio-iommu-type1 after enumerating
>> the reserved regions
>> - at the moment ARM MSI base address/size is left unchanged compared
>> to v2
>> - we currently report reserved regions and not usable IOVA regions as
>> requested by Alex
>>
>> RFC v1 -> v2:
>> - fix intel_add_reserved_regions
>> - add mutex lock/unlock in vfio_iommu_type1
>>
>>
>> Eric Auger (10):
>> iommu/dma: Allow MSI-only cookies
>> iommu: Rename iommu_dm_regions into iommu_resv_regions
>> iommu: Add new reserved IOMMU attributes
>> iommu: iommu_alloc_resv_region
>> iommu: Do not map reserved regions
>> iommu: iommu_get_group_resv_regions
>> iommu: Implement reserved_regions iommu-group sysfs file
>> iommu/vt-d: Implement reserved region get/put callbacks
>> iommu/arm-smmu: Implement reserved region get/put callbacks
>> vfio/type1: Get MSI cookie
>>
>> drivers/iommu/amd_iommu.c | 20 +++---
>> drivers/iommu/arm-smmu.c | 52 +++++++++++++++
>> drivers/iommu/dma-iommu.c | 116 ++++++++++++++++++++++++++-------
>> drivers/iommu/intel-iommu.c | 50 ++++++++++----
>> drivers/iommu/iommu.c | 141 ++++++++++++++++++++++++++++++++++++----
>> drivers/vfio/vfio_iommu_type1.c | 26 ++++++++
>> include/linux/dma-iommu.h | 7 ++
>> include/linux/iommu.h | 49 ++++++++++----
>> 8 files changed, 391 insertions(+), 70 deletions(-)
>>