RE: [PATCH v4 4/6] vfio/type1: check dma map request is within a valid iova range

From: Shameerali Kolothum Thodi
Date: Wed Feb 28 2018 - 08:40:02 EST


Hi Eric,

> -----Original Message-----
> From: Auger Eric [mailto:eric.auger@xxxxxxxxxx]
> Sent: Wednesday, February 28, 2018 11:53 AM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx>;
> Alex Williamson <alex.williamson@xxxxxxxxxx>
> Cc: pmorel@xxxxxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; Linuxarm <linuxarm@xxxxxxxxxx>; John Garry
> <john.garry@xxxxxxxxxx>; xuwei (O) <xuwei5@xxxxxxxxxx>; Robin Murphy
> <robin.murphy@xxxxxxx>
> Subject: Re: [PATCH v4 4/6] vfio/type1: check dma map request is within a valid
> iova range
>
> Hi Shameer,
>
> On 28/02/18 10:25, Shameerali Kolothum Thodi wrote:
> >
> >
> >> -----Original Message-----
> >> From: Auger Eric [mailto:eric.auger@xxxxxxxxxx]
> >> Sent: Wednesday, February 28, 2018 9:02 AM
> >> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx>;
> >> Alex Williamson <alex.williamson@xxxxxxxxxx>
> >> Cc: pmorel@xxxxxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; linux-
> >> kernel@xxxxxxxxxxxxxxx; Linuxarm <linuxarm@xxxxxxxxxx>; John Garry
> >> <john.garry@xxxxxxxxxx>; xuwei (O) <xuwei5@xxxxxxxxxx>; Robin
> Murphy
> >> <robin.murphy@xxxxxxx>
> >> Subject: Re: [PATCH v4 4/6] vfio/type1: check dma map request is within a
> valid
> >> iova range
> >>
> >> Hi Shameer,
> >>
> >> On 27/02/18 10:57, Shameerali Kolothum Thodi wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Auger Eric [mailto:eric.auger@xxxxxxxxxx]
> >>>> Sent: Tuesday, February 27, 2018 8:27 AM
> >>>> To: Alex Williamson <alex.williamson@xxxxxxxxxx>
> >>>> Cc: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@xxxxxxxxxx>;
> >>>> pmorel@xxxxxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; linux-
> >>>> kernel@xxxxxxxxxxxxxxx; Linuxarm <linuxarm@xxxxxxxxxx>; John Garry
> >>>> <john.garry@xxxxxxxxxx>; xuwei (O) <xuwei5@xxxxxxxxxx>; Robin
> >> Murphy
> >>>> <robin.murphy@xxxxxxx>
> >>>> Subject: Re: [PATCH v4 4/6] vfio/type1: check dma map request is within a
> >> valid
> >>>> iova range
> >>>>
> >>>> Hi,
> >>>> On 27/02/18 00:13, Alex Williamson wrote:
> >>>>> On Mon, 26 Feb 2018 23:05:43 +0100
> >>>>> Auger Eric <eric.auger@xxxxxxxxxx> wrote:
> >>>>>
> >>>>>> Hi Shameer,
> >>>>>>
> >>>>>> [Adding Robin in CC]
> >>>>>> On 21/02/18 13:22, Shameer Kolothum wrote:
> >>>>>>> This checks and rejects any dma map request outside valid iova
> >>>>>>> range.
> >>>>>>>
> >>>>>>> Signed-off-by: Shameer Kolothum
> >>>> <shameerali.kolothum.thodi@xxxxxxxxxx>
> >>>>>>> ---
> >>>>>>> drivers/vfio/vfio_iommu_type1.c | 22 ++++++++++++++++++++++
> >>>>>>> 1 file changed, 22 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
> >>>> b/drivers/vfio/vfio_iommu_type1.c
> >>>>>>> index a80884e..3049393 100644
> >>>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
> >>>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
> >>>>>>> @@ -970,6 +970,23 @@ static int vfio_pin_map_dma(struct
> vfio_iommu
> >>>> *iommu, struct vfio_dma *dma,
> >>>>>>> return ret;
> >>>>>>> }
> >>>>>>>
> >>>>>>> +/*
> >>>>>>> + * Check dma map request is within a valid iova range
> >>>>>>> + */
> >>>>>>> +static bool vfio_iommu_iova_dma_valid(struct vfio_iommu *iommu,
> >>>>>>> + dma_addr_t start, dma_addr_t end)
> >>>>>>> +{
> >>>>>>> + struct list_head *iova = &iommu->iova_list;
> >>>>>>> + struct vfio_iova *node;
> >>>>>>> +
> >>>>>>> + list_for_each_entry(node, iova, list) {
> >>>>>>> + if ((start >= node->start) && (end <= node->end))
> >>>>>>> + return true;
> >>>>>> I am now confused by the fact this change will prevent existing QEMU
> >>>>>> from working with this series on some platforms. For instance QEMU
> virt
> >>>>>> machine GPA space collides with Seattle PCI host bridge windows. On
> >> ARM
> >>>>>> the smmu and smmuv3 drivers report the PCI host bridge windows as
> >>>>>> reserved regions which does not seem to be the case on other
> platforms.
> >>>>>> The change happened in commit
> >>>> 273df9635385b2156851c7ee49f40658d7bcb29d
> >>>>>> (iommu/dma: Make PCI window reservation generic).
> >>>>>>
> >>>>>> For background, we already discussed the topic after LPC 2016. See
> >>>>>> https://www.spinics.net/lists/kernel/msg2379607.html.
> >>>>>>
> >>>>>> So is it the right choice to expose PCI host bridge windows as reserved
> >>>>>> regions? If yes shouldn't we make a difference between those and MSI
> >>>>>> windows in this series and do not reject any user space DMA_MAP
> >> attempt
> >>>>>> within PCI host bridge windows.
> >>>>>
> >>>>> If the QEMU machine GPA collides with a reserved region today, then
> >>>>> either:
> >>>>>
> >>>>> a) The mapping through the IOMMU works and the reserved region is
> >> wrong
> >>>>>
> >>>>> or
> >>>>>
> >>>>> b) The mapping doesn't actually work, QEMU is at risk of data loss by
> >>>>> being told that it worked, and we're justified in changing that
> >>>>> behavior.
> >>>>>
> >>>>> Without knowing the specifics of SMMU, it doesn't particularly make
> >>>>> sense to me to mark the entire PCI hierarchy MMIO range as reserved,
> >>>>> unless perhaps the IOMMU is incapable of translating those IOVAs.
> >>>> to me the limitation does not come from the smmu itself, which is a
> >>>> separate HW block sitting between the root complex and the
> interconnect.
> >>>> If ACS is not enforced by the PCIe subsystem, the transaction will never
> >>>> reach the IOMMU.
> >>>
> >>> True. And we do have one such platform where ACS is not enforced but
> >>> reserving the regions and possibly creating holes while launching VM will
> >>> make it secure. But I do wonder how we will solve the device grouping
> >>> in such cases.
> >>>
> >>> The Seattle PCI host bridge windows case you mentioned has any pci quirk
> >>> to claim that they support ACS?
> >> No there is none to my knowledge. I am applying Alex' not upstream ACS
> >> overwrite patch.
> >
> > Ok. But isn't that patch actually applicable to cases where ACS is really
> supported
> > by hardware but the capability is not available?
>
> My understanding is normally yes. If you apply the patch whereas the HW
> practically does not support ACS, then you fool the kernel pretending
> there is isolation whereas there is not. I don't know the exact
> capability of the HW on AMD Seattle and effectively I should have cared
> about it much earlier and if the HW capability were supported and not
> properly exposed we should have implemented a clean quirk for this platform.

Ok. Thanks for the details.

>
> I am just trying to see whether
> > the argument that we should allow DMA MAP requests for this(non-ACS case)
> > even if the Guest GPA conflict with reserved region holds good. The fact that
> may
> > be it was working before is that the Guest never actually allocated any GPA
> from
> > the reserved region or maybe I am missing something here.
>
> If my understanding is correct, in ideal world we would report the PCI
> host bridge window as reserved only in case ACS is not supported. If you
> apply the patch and overrides the ACS, then the DMA_MAP would be
> allowed. In case the HW does not support ACS, then you could face
> situations where one EP tries to access GPA that never reaches the IOMMU
> (because it corresponds to the BAR of another downstream EP). Same can
> happen at the moment.

Yes, this is my understanding too.

Just wondering the below changes to the iommu_dma_get_resv_regions() is
good enough to take care this issue or not.

Thanks,
Shameer

-- >8 --
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f05f3cf..b6e89d5 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -34,6 +34,8 @@

#define IOMMU_MAPPING_ERROR 0

+#define REQ_ACS_FLAGS (PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF)
+
struct iommu_dma_msi_page {
struct list_head list;
dma_addr_t iova;
@@ -183,6 +185,9 @@ void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list)
if (!dev_is_pci(dev))
return;

+ if (pci_acs_path_enabled(to_pci_dev(dev), NULL, REQ_ACS_FLAGS))
+ return;
+
bridge = pci_find_host_bridge(to_pci_dev(dev)->bus);
resource_list_for_each_entry(window, &bridge->windows) {
struct iommu_resv_region *region;
---8--


>
> Eric
> >
> > Thanks,
> > Shameer
> >
> >> Thanks
> >>
> >> Eric
> >>>
> >>>> In the case of such overlap, shouldn't we just warn the end-user that
> >>>> this situation is dangerous instead of forbidding the use case which
> >>>> worked "in most cases" until now.
> >>>
> >>> Yes, may be something similar to the allow_unsafe_interrupts case, if
> >>> that is acceptable.
> >>>
> >>> Thanks,
> >>> Shameer
> >>>
> >>>>> Are we trying to prevent untranslated p2p with this reserved range?
> >>>>> That's not necessarily a terrible idea, but it seems that doing it for
> >>>>> that purpose would need to be a lot smarter, taking into account ACS
> >>>>> and precisely selecting ranges within the peer address space that would
> >>>>> be untranslated. Perhaps only populated MMIO within non-ACS
> >>>>> hierarchies. Thanks,
> >>>>
> >>>> Indeed taking into account the ACS capability would refine the
> >>>> situations where a risk exists.
> >>>>
> >>>> Thanks
> >>>>
> >>>> Eric
> >>>>>
> >>>>> Alex
> >>>>>