RE: [PATCH] iommu/vt-d: Atomic breakdown of IOPT into finer granularity

From: Tian, Kevin
Date: Mon Aug 14 2023 - 23:17:57 EST


> From: Baolu Lu <baolu.lu@xxxxxxxxxxxxxxx>
> Sent: Tuesday, August 15, 2023 10:06 AM
>
> [Please allow me to include Kevin and Alex in this thread.]
>
> On 2023/8/14 20:10, Jie Ji wrote:
> > With the addition of IOMMU support for IO page fault, it's now possible
> > to unpin the memory which DMA remapping. However, the lack of support
> > for unmapping a subrange of the I/O page table (IOPT) in IOMMU can lead
> > to some issues.
>
> Is this the right contract about how iommu_map/unmap() should be used?
> If I remember it correctly, IOVA ranges should be mapped in pairs. That
> means, if a range is mapped by iommu_map(), the same range should be
> unmapped with iommu_unmap().
>
> Any misunderstanding or anything changed?
>
> >
> > For instance, a virtual machine can establish IOPT of 2M/1G for better
> > performance, while the host system enable swap and attempts to swap out
> > some 4K pages. Unfortunately, unmap subrange of the large-page mapping
> > will make IOMMU page walk to error level, and finally cause kernel crash.
>
> Sorry that I can't fully understand this use case. Are you talking about
> the nested translation where user spaces manage their own IO page
> tables? But how can those pages been swapped out?
>

It's not related to nested. I think they are interested in I/O page fault in
stage-2 so there is no need to pin the guest memory.

But I don't think this patch along makes any sense. It should be part of
a big series which enables iommufd to support stage-2 page fault, e.g.
iommufd will register a fault handler on stage-2 hwpt which first calls
handle_mm_fault() to fix cpu page table then calls iommu_map() to
setup the iova mapping. Then upon mmu notifier on any host mapping
changes from mm, iommufd calls iommu_unmap() or other helpers to
adjust the iova mapping accordingly.

the io_pagetable metadata which tracks user request is unchanged
in that process.

vfio driver needs report to iommufd whether a bound device can fully
support I/O page fault for all DMA requests (beyond what PCI PRI allows).

There are a lot to do before we need take time to review this iommu
driver specific change.