[PATCH 16/20 v2] iommu/amd: Optimize map_sg and unmap_sg

From: Joerg Roedel
Date: Tue Jul 12 2016 - 09:31:00 EST


On Tue, Jul 12, 2016 at 12:33:43PM +0100, Robin Murphy wrote:
> > + for_each_sg(sglist, s, nelems, i)
> > + npages += iommu_num_pages(sg_phys(s), s->length, PAGE_SIZE);
>
> This fails to account for the segment boundary mask[1]. Given a typical
> sglist from the block layer where the boundary mask is 64K, the first
> segment is 8k long, and subsequent segments are 64K long, those
> subsequent segments will end up with misaligned addresses which certain
> hardware may object to.

Yeah, right. It doesn't matter much on x86, as the smallest
boundary-mask I have seen is 4G, but to be correct it should be
accounted in. How does the attached patch look?

>
> > + address = dma_ops_alloc_iova(dev, dma_dom, npages, dma_mask);
>
> Since a typical dma_map_sg() call is likely to involve >128K worth of
> data, I wonder if it's worth going directly to a slow-path IOVA
> allocation...

Well, the allocator is the bottle-neck, so I try not to call it for
every sg-element. The global locks have been removed, but more
allocations/deallocations also mean that the per-cpu free-lists fill up
quicker and that we have to flush the IOTLBs more often, which costs
performance.

> [1]:http://article.gmane.org/gmane.linux.kernel.iommu/10553 - almost the
> 1-year anniversary of you making much the same comment to me :D

Touché ;-)

Here is the updated patch: