Re: [PATCH v2 1/3] iommu/s390: Fix duplicate domain attachments

From: Niklas Schnelle
Date: Thu Sep 29 2022 - 03:48:08 EST


On Wed, 2022-09-28 at 10:32 -0300, Jason Gunthorpe wrote:
> On Wed, Sep 28, 2022 at 10:58:22AM +0200, Niklas Schnelle wrote:
> > On Tue, 2022-09-27 at 13:56 -0300, Jason Gunthorpe wrote:
> > > On Tue, Sep 27, 2022 at 06:33:48PM +0200, Niklas Schnelle wrote:
> > >
> > > > Not sure what the non-MSI reservation is for? It does seem like x86_64
> > > > also uses this for quite large ranges.
> > >
> > > There are lots of things that are unsuitable for DMA on x86 platforms,
> > > unfortunately.. But yeah, I'm not sure either.
> > >
> > > > This is because I'm getting a map request for an IOVA in the reserved
> > > > region.
> > >
> > > How come? iova_reserve_iommu_regions() reads the reserved regions and
> > > loads them as reserved into the iovad which should cause
> > > iommu_dma_alloc_iova() and alloc_iova_fast() to not return values in
> > > those ranges.
> > >
> > > It all looks like it is supposed to work
> > >
> > > Did something go wrong in the initialization order perhaps?
> > >
> > > Jason
> >
> > It was of course a classic off-by-one, the table size is a number of
> > entries but geometry.aperture_end seems to be the largest allowed IOVA.
> > So we need:
>
> Right, I dislike this naming usually 'end' means "start + length" and
> 'last' means "start + length - 1"
>
> > Otherwise the first IOVA allocated is ZPCI_TABLE_SIZE_RT itself.
> > Similarly we need the second reserved region if (zdev->end_dma <
> > ZPCI_TABLE_SIZE_RT - 1). In your patch I think you had the
> > MAX_DMA_TABLE_ADDR name right but would have also calculated the number
> > of entries.
>
> Make sense..
>
> > On the other hand with the dma-iommu.c conversion it no longer makes
> > sense to lower zdev->end_dma artificially, so at least on current
> > machine LPARs we would end up with just a lower reserved region
> > 0x0000000000000000 to 0x00000000ffffffff and can use IOVAs up to
> > aperture_end.
>
> So zdev->end_dma == MAX_DMA_TABLE_ADDR?
>
> (and is zdev->end_dma and 'end' or 'last' ?)

Basically yes though at least on LPARs the firmware interface that
gives us the initial zdev->end returns an even higher value but we
clamp it down to the aperture. It is "start + length - 1".

>
> Can you include this patch once you are happy with it, it nicely
> tidies this series?
>
> Thanks,
> Jason

Yes will do. In the meantime I'm now close to sending an RFC version of
the conversion to dma-iommu. So my plan is to send out 3 series of
patches.

1. v3 of this series of IOMMU fixes including your suggestion to use
reserved ranges, the previously mentioned off-by-one fix and another
IOMMU issue I found (pgsize_bitmap is wrong).

2. A series of improvements to the s390 IOMMU code including
implementing map_pages() and lock-free page table updates

3. A series converting s390 to use dma-iommu plus changes against dma-
iommu.c common code to implement an alternative flushing scheme that
brings z/VM and KVM guest PCI performance back to the level of our
existing DMA API implementation.