Re: Summary of LPC guest MSI discussion in Santa Fe

From: Robin Murphy
Date: Wed Nov 09 2016 - 15:11:27 EST


On 09/11/16 18:59, Don Dutile wrote:
> On 11/09/2016 12:03 PM, Will Deacon wrote:
>> On Tue, Nov 08, 2016 at 09:52:33PM -0500, Don Dutile wrote:
>>> On 11/08/2016 06:35 PM, Alex Williamson wrote:
>>>> On Tue, 8 Nov 2016 21:29:22 +0100
>>>> Christoffer Dall <christoffer.dall@xxxxxxxxxx> wrote:
>>>>> Is my understanding correct, that you need to tell userspace about the
>>>>> location of the doorbell (in the IOVA space) in case (2), because even
>>>>> though the configuration of the device is handled by the (host) kernel
>>>>> through trapping of the BARs, we have to avoid the VFIO user
>>>>> programming
>>>>> the device to create other DMA transactions to this particular
>>>>> address,
>>>>> since that will obviously conflict and either not produce the desired
>>>>> DMA transactions or result in unintended weird interrupts?
>>
>> Yes, that's the crux of the issue.
>>
>>>> Correct, if the MSI doorbell IOVA range overlaps RAM in the VM, then
>>>> it's potentially a DMA target and we'll get bogus data on DMA read from
>>>> the device, and lose data and potentially trigger spurious
>>>> interrupts on
>>>> DMA write from the device. Thanks,
>>>>
>>> That's b/c the MSI doorbells are not positioned *above* the SMMU, i.e.,
>>> they address match before the SMMU checks are done. if
>>> all DMA addrs had to go through SMMU first, then the DMA access could
>>> be ignored/rejected.
>>
>> That's actually not true :( The SMMU can't generally distinguish
>> between MSI
>> writes and DMA writes, so it would just see a write transaction to the
>> doorbell address, regardless of how it was generated by the endpoint.
>>
>> Will
>>
> So, we have real systems where MSI doorbells are placed at the same IOVA
> that could have memory for a guest, but not at the same IOVA as memory
> on real hw ?

MSI doorbells integral to PCIe root complexes (and thus untranslatable)
typically have a programmable address, so could be anywhere. In the more
general category of "special hardware addresses", QEMU's default ARM
guest memory map puts RAM starting at 0x40000000; on the ARM Juno
platform, that happens to be where PCI config space starts; as Juno's
PCIe doesn't support ACS, peer-to-peer or anything clever, if you assign
the PCI bus to a guest (all of it, given the lack of ACS), the root
complex just sees the guest's attempts to DMA to "memory" as the device
attempting to access config space and aborts them.

> How are memory holes passed to SMMU so it doesn't have this issue for
> bare-metal
> (assign an IOVA that overlaps an MSI doorbell address)?

When we *are* in full control of the IOVA space, we just carve out what
we can find as best we can - see iova_reserve_pci_windows() in
dma-iommu.c, which isn't really all that different to what x86 does
(e.g. init_reserved_iova_ranges() in amd-iommu.c). Note that we don't
actually have any way currently to discover upstream MSI doorbells
(ponder dw_pcie_msi_init() in pcie-designware.c for an example of the
problem) - the specific MSI support we have in DMA ops at the moment
only covers GICv2m or GICv3 ITS downstream of translation, but
fortunately that's the typical relevant use-case on current platforms.

Robin.