Re: [PATCH v6 4/6] swiotlb: Fix alignment checks when both allocation and DMA masks are present

From: Petr Tesařík
Date: Mon Mar 11 2024 - 16:05:19 EST


On Fri, 8 Mar 2024 15:28:27 +0000
Will Deacon <will@xxxxxxxxxx> wrote:

> Nicolin reports that swiotlb buffer allocations fail for an NVME device
> behind an IOMMU using 64KiB pages. This is because we end up with a
> minimum allocation alignment of 64KiB (for the IOMMU to map the buffer
> safely) but a minimum DMA alignment mask corresponding to a 4KiB NVME
> page (i.e. preserving the 4KiB page offset from the original allocation).
> If the original address is not 4KiB-aligned, the allocation will fail
> because swiotlb_search_pool_area() erroneously compares these unmasked
> bits with the 64KiB-aligned candidate allocation.
>
> Tweak swiotlb_search_pool_area() so that the DMA alignment mask is
> reduced based on the required alignment of the allocation.
>
> Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
> Reported-by: Nicolin Chen <nicolinc@xxxxxxxxxx>
> Link: https://lore.kernel.org/r/cover.1707851466.git.nicolinc@xxxxxxxxxx
> Tested-by: Nicolin Chen <nicolinc@xxxxxxxxxx>
> Reviewed-by: Michael Kelley <mhklinux@xxxxxxxxxxx>
> Signed-off-by: Will Deacon <will@xxxxxxxxxx>
> ---
> kernel/dma/swiotlb.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index c20324fba814..c381a7ed718f 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -981,8 +981,7 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> dma_addr_t tbl_dma_addr =
> phys_to_dma_unencrypted(dev, pool->start) & boundary_mask;
> unsigned long max_slots = get_max_slots(boundary_mask);
> - unsigned int iotlb_align_mask =
> - dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
> + unsigned int iotlb_align_mask = dma_get_min_align_mask(dev);
> unsigned int nslots = nr_slots(alloc_size), stride;
> unsigned int offset = swiotlb_align_offset(dev, orig_addr);
> unsigned int index, slots_checked, count = 0, i;
> @@ -993,6 +992,14 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> BUG_ON(!nslots);
> BUG_ON(area_index >= pool->nareas);
>
> + /*
> + * Ensure that the allocation is at least slot-aligned and update
> + * 'iotlb_align_mask' to ignore bits that will be preserved when
> + * offsetting into the allocation.
> + */
> + alloc_align_mask |= (IO_TLB_SIZE - 1);
> + iotlb_align_mask &= ~alloc_align_mask;
> +

I have started writing the KUnit test suite, and the results look
incorrect to me for this case.

I'm calling swiotlb_tbl_map_single() with:

* alloc_align_mask = 0xfff
* a device with min_align_mask = 0xfff
* the 12 lowest bits of orig_addr are 0xfa0

The min_align_mask becomes zero after the masking added by this patch,
and the 12 lowest bits of the returned address are 0x7a0, i.e. not
equal to 0xfa0.

In other words, the min_align_mask constraint is not honored. Of course,
given the above values, it is not possible to honor both min_align_mask
and alloc_align_mask. I find it somewhat surprising that NVMe does not
in fact require that the NVME_CTRL_PAGE_SHIFT low bits are preserved,
as suggested by Nicolin's successful testing.

Why is that?

Does IOMMU do some additional post-processing of the bounce buffer
address to restore the value of bit 11?

Or is this bit always zero in all real-world scenarios?

Petr T