Re: Memory corruption with CONFIG_SWIOTLB_DYNAMIC=y

From: Halil Pasic
Date: Fri Nov 03 2023 - 12:15:28 EST

Next message: Alexander Aring: "[RFC 2/2] kref: introduce kref_put_lockptr() and use lockptr"
Previous message: Steve Wahl: "Re: [PATCH] x86/mm/ident_map: Use gbpages only where full GB page should be mapped."
In reply to: Niklas Schnelle: "Memory corruption with CONFIG_SWIOTLB_DYNAMIC=y"
Next in thread: Petr Tesařík: "Re: Memory corruption with CONFIG_SWIOTLB_DYNAMIC=y"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, 03 Nov 2023 16:13:03 +0100
Niklas Schnelle <schnelle@xxxxxxxxxxxxx> wrote:

> The reason for 1) is a bit more convoluted and not entirely understood
> by us. We are certain though that the function swiotlb_find_slots()
> allocates a pool with nr_slots(alloc_size), where this alloc_size is
> the alloc_size from swiotlb_tbl_map_single() + swiotlb_align_offset(),
> but for alignment reasons some slots may get "skipped over" in
> swiotlb_area_find_slots() causing the picked slots to overrun the
> allocation.
>
> Not sure how to properly fix this as the different alignment
> requirements get pretty complex quickly. So would appreciate your
> input.

Let me post a more detailed analysis of why do we observe
swiotlb_area_find_slots() considering the slot with the
index 0 invalid in our particular case, and how does that
relate to the whole "alignment" complex.

Currently there are three distinct mechanisms that dictate the "alignment":
a) min_align_mask (introduced by 36950f2da1ea ("driver core: add a
min_align_mask))
field to struct device_dma_parameters"))
b) alloc_size >= PAGE_SIZE which requires "page alignment"
c) alloc_aligned_mask.

In our case min_align_mask == 0 and a) is thus not applicable, because b) and
c) we end up with iotlb_align_mask = 0x800. And because orig_add & 0x800 ==
0x800 but pool->start & 0x800 == 0 and the slot at index i is skipped over. The
slot 0 is skipped over because it is page aligned, when !!((1UL << PAGE_SHIFT)
& orig_addr)

Let us note that with the current implementation the min_align_size mask, that
is mechanism a) also controls the tlb_addr within the first slot so that:
tlb_addr & min_align_mask == orig_addr & min_align_mask. In that sense a) is
very unlike b) and c).

For example, if !min_align_mask, then tlb_addr & (IO_TLB_SIZE - 1) is always 0,
even if the alloc_size is >= PAGE_SIZE or if alloc_aligned_size is non 0.

If with b) and c) the goal is that the swiotlb buffer shall not stretch over
more pages or address space blocks of a size dictated by alloc_aligned_mask
then, that goal is accomplished. If however the goal is to preserve the offsets
modulo some exponent of 2 dictated either by PAGE_SHIFT or by alloc_aligned
mask, then that goal is not reached.

But there is more to it! In the beginning there was b), or more precisely in the
olden days for mapping_size >= PAGE_SIZE we used to allocate properly page
aligned bounce buffers. That is tlb_addr & (~PAGE_MASK) == 0 regardless of what
orig_addr & (~PAGE_MASK) & (IO_TLB_SIZE - 1) is. That first got borked by
commit 1f221a0d0dbf ("swiotlb: respect min_align_mask") and then it did not get
fixed by commit 0eee5ae10256 ("swiotlb: fix slot alignment checks").

Let us also note that if more than one of the above mechanisms are applicable,
then for the slot selection the idea is apparently to go with the strictest
"alignment requirement", while for the offset within the slot only a) matters
(if applicable, i.e. min_align_mask != 0), which may appear strange if
not thoroughly documented.

In our opinion the first step towards getting this right is to figure out what
the different kinds of alignments are really supposed to mean. For each of the
mechanisms we need to understand and document, whether making sure that the
bounce buffer does not stretch over more of certain units of memory (like,
pages, iova granule size, whatever), or is it about preserving offset within a
certain unit of memory, and if yes to what extent (the least significant n-bits
of the orig_addr dictated by the respective mask, or something different).

Thank you for your help in advance!

Regards,
Halil and Niklas

Next message: Alexander Aring: "[RFC 2/2] kref: introduce kref_put_lockptr() and use lockptr"
Previous message: Steve Wahl: "Re: [PATCH] x86/mm/ident_map: Use gbpages only where full GB page should be mapped."
In reply to: Niklas Schnelle: "Memory corruption with CONFIG_SWIOTLB_DYNAMIC=y"
Next in thread: Petr Tesařík: "Re: Memory corruption with CONFIG_SWIOTLB_DYNAMIC=y"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]