RE: [PATCH v5 6/6] swiotlb: Remove pointless stride adjustment for allocations >= PAGE_SIZE

From: Michael Kelley
Date: Mon Mar 04 2024 - 11:04:22 EST


From: Petr Tesařík <petr@xxxxxxxxxxx> Sent: Monday, March 4, 2024 7:55 AM
>
> On Mon, 4 Mar 2024 13:37:56 +0000
> Robin Murphy <robin.murphy@xxxxxxx> wrote:
>
> > On 04/03/2024 11:00 am, Petr Tesařík wrote:
> > [...]
> > >> Here's my take on tying all the threads together. There are
> > >> four alignment combinations:
> > >>
> > >> 1. alloc_align_mask: zero; min_align_mask: zero
> > >> 2. alloc_align_mask: zero; min_align_mask: non-zero
> > >> 3. alloc_align_mask: non-zero; min_align_mask: zero/ignored
> > >> 4. alloc_align_mask: non-zero; min_align_mask: non-zero
> > >
> > > What does "min_align_mask: zero/ignored" mean? Under which
> > > circumstances should be a non-zero min_align_mask ignored?

"Ignored" was my short-hand for the swiotlb_alloc() case where
orig_addr is zero. Even if min_align_mask is set for the device, it
doesn't have any effect when orig_addr is zero.

> > >
> > >> xen_swiotlb_map_page() and dma_direct_map_page() are #1 or #2
> > >> via swiotlb_map() and swiotlb_tbl_map_single()
> > >>
> > >> iommu_dma_map_page() is #3 and #4 via swiotlb_tbl_map_single()
> > >>
> > >> swiotlb_alloc() is #3, directly to swiotlb_find_slots()
> > >>
> > >> For #1, the returned physical address has no constraints if
> > >> the requested size is less than a page. For page size or
> > >> greater, the discussed historical requirement for page
> > >> alignment applies.
> > >>
> > >> For #2, min_align_mask governs the bits of the returned
> > >> physical address that must match the original address. When
> > >> needed, swiotlb must also allocate pre-padding aligned to
> > >> IO_TLB_SIZE that precedes the returned physical address. A
> > >> request size <= swiotlb_max_mapping_size() will not exceed
> > >> IO_TLB_SEGSIZE even with the padding. The historical
> > >> requirement for page alignment does not apply because the
> > >> driver has explicitly used the newer min_align_mask feature.
> > >
> > > What is the idea here? Is it the assumption that only old drivers rely
> > > on page alignment, so if they use min_align_mask, it proves that they
> > > are new and must not rely on page alignment?
> >
> > Yes, if a driver goes out of its way to set a min_align_mask which is
> > smaller than its actual alignment constraint, that is clearly the
> > driver's own bug. Strictly we only need to be sympathetic to drivers
> > which predate min_align_mask, when implicitly relying on page alignment
> > was all they had.
> >
> > >> For #3, alloc_align_mask specifies the required alignment. No
> > >> pre-padding is needed. Per earlier comments from Robin[1],
> > >> it's reasonable to assume alloc_align_mask (i.e., the granule)
> > >> is >= IO_TLB_SIZE. The original address is not relevant in
> > >> determining the alignment, and the historical page alignment
> > >> requirement does not apply since alloc_align_mask explicitly
> > >> states the alignment.
> >
> > FWIW I'm also starting to wonder about getting rid of the alloc_size
> > argument and just have SWIOTLB round the end address up to
> > alloc_align_mask itself as part of all these calculations. Seems like it
> > could potentially end up a little simpler, maybe?

Yes, I was thinking exactly this. But my reasoning was to solve the
bug in #4 that I previously pointed out. If iommu_dma_map_page()
does *not* do

aligned_size = iova_align(iovad, size);

but swiotlb_tbl_map_single() rounds up the size based on
alloc_align_mask *after* adding the offset modulo
min_align_mask, then the rounded-up size won't exceed IO_TLB_SIZE,
regardless of which bits are set in orig_addr.

> >
> > >> For #4, the returned physical address must match the bits
> > >> in the original address specified by min_align_mask. swiotlb
> > >> swiotlb must also allocate pre-padding aligned to
> > >> alloc_align_mask that precedes the returned physical address.
> > >> Also per Robin[1], assume alloc_align_mask is >=
> > >> min_align_mask, which solves the conflicting alignment
> > >> problem pointed out by Petr[2]. Perhaps we should add a
> > >> "WARN_ON(alloc_align_mask < min_align_mask)" rather than
> > >> failing depending on which bits of the original address are
> > >> set. Again, the historical requirement for page alignment does
> > >> not apply.
> > >
> > > AFAICS the only reason this works in practice is that there are only
> > > two in-tree users of min_align_mask: NVMe and Hyper-V. Both use a mask
> > > of 12 bits, and the IOVA granule size is never smaller than 4K.
> >
> > If we assume a nonzero alloc_align_mask exclusively signifies iommu-dma,
> > then for this situation SWIOTLB should only need to worry about the
> > intersection of alloc_align_mask & min_align_mask, since any
> > min_align_mask bits larger than the IOVA granule would need to be
> > accounted for in the IOVA allocation regardless of SWIOTLB.
>
> Ah, right, it's not limited to bounce buffers.
>
> > > If we want to rely on this, then I propose to make a BUG_ON() rather
> > > than WARN_ON().
> >
> > I've just proposed a patch to make it not matter for now - the nature of
> > iommu-dma makes it slightly more awkward to prevent SWIOTLB from ever
> > seeing this condition at all, so I chose not to do that, but as long as
> > swiotlb_tbl_map_single() does *something* for conflicting constraints
> > without completely falling over, which swiotlb_tbl_unmap_single can then
> > undo again, then it should be fine.
>
> Yes. It may allocate an unsuitably aligned bounce buffer, or it may
> fail, but your IOMMU patch will continue to work (and also cover the
> non-SWIOTLB case).
>
> I believe this patch series is now good as is, except the commit
> message should make it clear that alloc_align_mask and min_align_mask
> can both be zero, but that simply means no alignment constraints.
>

No, I think we need to add the historical page alignment functionality
back again for my #1 (alloc_align_mask and min_align_mask both zero).
We don't know what old drivers might be depending on, and we don't
want to risk breaking them.

Michael