Re: [PATCH 1/3] iommu/io-pgtable-arm: Add nents_per_pgtable in struct io_pgtable_cfg

From: Nicolin Chen
Date: Tue Jan 23 2024 - 19:12:05 EST


On Mon, Jan 22, 2024 at 01:57:00PM -0400, Jason Gunthorpe wrote:
> On Mon, Jan 22, 2024 at 09:24:08AM -0800, Nicolin Chen wrote:
> > > Or do we need to measure at boot time invalidation performance and set
> > > a threshold that way?
> >
> > I see. We can run an invalidation at default max_tlbi_ops to
> > get its delay in msec or usec, and then set as the threshold
> > "xx ms" in the idea one.
> >
> > > Also, it seems to me that SVA use cases and, say, DMA API cases are
> > > somewhat different where we may be willing to wait longer for DMA API.
> >
> > Hmm, the lockup that my patch fixed was for an SVA case that
> > doesn't seem to involve DMA API:
> > https://lore.kernel.org/linux-iommu/20230901203904.4073-1-nicolinc@xxxxxxxxxx/
> >
> > And the other lockup fix for a non-SVA case from Zhang doesn't
> > seem to involve DMA API either:
> > https://lore.kernel.org/linux-iommu/e74ea905-d107-4202-97ca-c2c509e7aa1e@xxxxxxxxxx/
> >
> > Maybe we can treat DMA API a bit different. But I am not sure
> > about the justification of leaving it to wait longer. Mind
> > elaborating?
>
> Well, there are two issues.. The first is the soft lockup, that should
> just be reliably prevented. The timer, for instance, is a reasonable
> stab at making that universally safe.
>
> Then there is the issue of just raw invalidation performance, where
> SVA particularly is linked to the mm and the longer invalidation takes
> the slower the apps will be. We don't have any idea where future DMA
> might hit the cache, so it is hard to know if all invalidation is not
> the right thing..
>
> DMA api is often lazy and the active DMA is a bit more predictable, so
> perhaps there is more merit in spending more time to narrow the
> invalidation.
>
> The other case was vfio unmap for VM tear down, which ideally would
> use whole ASID invalidation.

I see! Then we need a flag to pass in __iommu_dma_unmap or so.
If a caller is in dma-iommu.c, do a longer per-page invalidation.

> If your issue is softlockup, not performance, then that should be

We have both issues.

> prevented strongly. Broadly speaking if SVA is pushing too high an
> invalidation workload then we need to agressively trim it, and do so
> dynamically. Certainly we should not have a tunable that has to be set
> right to avoid soft lockup.
>
> A tunable to improve performance, perhaps, but not to achieve basic
> correctness.

So, should we make an optional tunable only for those who care
about performance? Though I think having a tunable would just
fix both issues.

> Maybe it is really just a simple thing - compute how many invalidation
> commands are needed, if they don't all fit in the current queue space,
> then do an invalidate all instead?

The queue could actually have a large space. But one large-size
invalidation would be divided into batches that have to execute
back-to-back. And the batch size is 64 commands in 64-bit case,
which might be too small as a cap.

Thanks
Nicolin