Re: [PATCH 1/3] iommu/io-pgtable-arm: Add nents_per_pgtable in struct io_pgtable_cfg

From: Nicolin Chen
Date: Thu Aug 31 2023 - 13:39:33 EST


On Wed, Aug 30, 2023 at 10:49:59PM +0100, Will Deacon wrote:

> On Tue, Aug 29, 2023 at 03:15:52PM -0700, Nicolin Chen wrote:
> > Meanwhile, by re-looking at Will's commit log:
> > arm64: tlbi: Set MAX_TLBI_OPS to PTRS_PER_PTE
> >
> > In order to reduce the possibility of soft lock-ups, we bound the
> > maximum number of TLBI operations performed by a single call to
> > flush_tlb_range() to an arbitrary constant of 1024.
> >
> > Whilst this does the job of avoiding lock-ups, we can actually be a bit
> > smarter by defining this as PTRS_PER_PTE. Due to the structure of our
> > page tables, using PTRS_PER_PTE means that an outer loop calling
> > flush_tlb_range() for entire table entries will end up performing just a
> > single TLBI operation for each entry. As an example, mremap()ing a 1GB
> > range mapped using 4k pages now requires only 512 TLBI operations when
> > moving the page tables as opposed to 262144 operations (512*512) when
> > using the current threshold of 1024.
> >
> > I found that I am actually not quite getting the calculation at the
> > end for the comparison between 512 and 262144.
> >
> > For a 4K pgsize setup, MAX_TLBI_OPS is set to 512, calculated from
> > 4096 / 8. Then, any VA range >= 2MB will trigger a flush_tlb_all().
> > By setting the threshold to 1024, the 2MB size bumps up to 4MB, i.e.
> > the condition becomes range >= 4MB.
> >
> > So, it seems to me that requesting a 1GB invalidation will trigger
> > a flush_tlb_all() in either case of having a 2MB or a 4MB threshold?
> >
> > I can get that the 262144 is the number of pages in a 1GB size, so
> > the number of per-page invalidations will be 262144 operations if
> > there was no threshold to replace with a full-as invalidation. Yet,
> > that wasn't the case since we had a 4MB threshold with an arbitrary
> > 1024 for MAX_TLBI_OPS?
>
> I think this is because you can't always batch up the entire range as
> you'd like due to things like locking concerns. For example,
> move_page_tables() can end up invalidating 2MiB at a time, which is
> too low to trigger the old threshold and so you end up doing ever single
> pte individually.

Thanks for elaborating! So, I see now that it was about the worst
case when 1GB breaks down to 512 pieces of 2MB range invalidation
ops, i.e. 512 flush_tlb_all ops v.s. 262144 flush_tlb_range ops.

Though I have not dig enough, I assume that this worst case could
happen to SVA too, since the IOTLB invalidation is from MMU code.
But the same worst case might not happen to non-SVA pathway, i.e.
TLBI ops for IO Page Table doesn't really benefit from this?

With that being questioned, I got Robin's remarks that it wouldn't
be easy to decide the arbitrary number, so we could just take the
worst case from SVA pathway as the common threshold.

Then, SVA uses the CPU page table, so perhaps we should highlight
that SMMU sets the threshold directly reusing the MAX_TLBI_OPS of
CPU page table rather than calculating from IO page table, though
both of them are in the same format?

Thank you
Nicolin