[RFC PATCH 0/3] powerpc: pSeries: vfio: iommu: Re-enable support for SPAPR TCE VFIO

From: Shivaprasad G Bhat
Date: Tue Mar 12 2024 - 14:14:53 EST


This patch-set fills the missing gaps on pSeries for VFIO SPAPR TCE
sub-driver thereby re-enabling the VFIO support on POWER pSeries
machines.

Structure of the patchset
=========================

* Currently, due to [1] VFIO_IOMMU_UNMAP_{DMA,UNMAP} ioctls are broken
on pSeries as it only supports single level TCEs. This is addressed in
the first patch as the lost functionality is restored.

On pSeries, the DMA windows (default 32-bit and the 64-bit DDW) are
"borrowed" between the host and the vfio driver. The necessary
mechanism for this is already been in place since [2]. However,
the VFIO SPAPR-TCE sub-driver doesn't open the 64-bit window if
it wasn't already done by a host driver like NVME. So the user-space
only gets access to the default 32-bit DMA window alone. This poses a
challenge for devices having no host kernel drivers and completely
depend on VFIO user-space interface to request DMA. The
VFIO_SPAPR_TCE_CREATE ioctl currently just returns EPERM without
attempting to open the second window for such devices.

* The second patch is just code movement for pSeries specific
functions from arch iommu to the pSeries platform iommu file.
This is needed as the DMA window manipulation operations
introduced in the third patch depend on these functions and are
entirely pSeries specific.

* The third patch adds necessary support to open up the 64-bit DMA
window on VFIO_SPAPR_TCE_CREATE ioctl from the user-space. It also
collects the DDW information from the platform for exposing it to
user through 'struct vfio_iommu_spapr_tce_ddw_info'.

Testing
========
These patches are tested with by attaching a nvme disk to a nested
kvm guest running a pSeries lpar. Also vfio-test [3] by Alex Willamson,
was forked and updated to add support for pSeries guest and used to
test these patches[4].

Limitations/Known Issues
========================
* Does not work for SRIOV VFs, as they have only one DMA window.
* Does not work for multi-function cards.
* Bugs
- mmdrop() in tce_iommu_release() when container detached
with pending unmaps.

[1] Commit: 090bad39b237 ("powerpc/powernv: Add indirect levels to it_userspace")
[2] Commit: 9d67c9433509 ("powerpc/iommu: Add \"borrowing\" iommu_table_group_ops")
[3] https://github.com/awilliam/tests
[4] https://github.com/nnmwebmin/vfio-ppc-tests/tree/vfio-ppc-ex

---

Shivaprasad G Bhat (3):
powerpc/pseries/iommu: Bring back userspace view for single level TCE tables
powerpc/iommu: Move pSeries specific functions to pseries/iommu.c
pseries/iommu: Enable DDW for VFIO TCE create


arch/powerpc/include/asm/iommu.h | 7 +-
arch/powerpc/kernel/iommu.c | 156 +-------
arch/powerpc/platforms/pseries/iommu.c | 514 ++++++++++++++++++++++++-
drivers/vfio/vfio_iommu_spapr_tce.c | 51 +++
4 files changed, 571 insertions(+), 157 deletions(-)