Re: [RFC PATCH 00/21] iommu/amd: Introduce support for HW accelerated vIOMMU w/ nested page table

From: Jason Gunthorpe
Date: Mon Jun 26 2023 - 09:20:52 EST


On Fri, Jun 23, 2023 at 07:08:54PM -0700, Suthikulpanit, Suravee wrote:
> > > The IOMMU hardware use the PAS for storing Guest IOMMU information such as
> > > Guest MMIOs, DevID Mapping Table, DomID Mapping Table, and Guest
> > > Command/Event/PPR logs.
> >
> > Why does it have to be in kernel memory?
> >
> > Why not store the whole thing in user mapped memory and have the VMM
> > manipulate it directly?
>
> The Guest MMIO, CmdBuf Dirty Status, are allocated per IOMMU instance. So,
> these data structure cannot be allocated by VMM.

Yes, that is unfortunate so much stuff here wasn't 4k aligned so it
could be mapped sensibly. It doesn't really make any sense to have a
giant repeated register map that still has to be hypervisor trapped, a
command queue would have been more logical :(

> In this case, the IOMMUFD_CMD_MMIO_ACCESS might still be needed.

It seems this is unavoidable, but it needs a clearer name and purpose.

But more importantly we don't really have any object to hang this off
of - we don't have the notion of a "VM" in iommufd right now.

We had sort of been handwaving that maybe the entire FD is a "VM" and
maybe that works for some scenarios, but I don't think it works for
what you need, especially if you consider multi-instance.

So, it is good that you brought this series right now as I think it
needs harmonizing with what ARM needs to do, and this is the more
complex version of the two.

> The DomID and DevID mapping tables are allocated per-VM:
> * DomID Mapping Table (512 KB contiguous memory)
> * DevID Mapping Table (1 MB contiguous memory)

But these can be mapped into that IPA space at 4k granularity?
They just need contiguous IOVA? So the VMM could provide this memory
and we don't need calls to manipulate it?

> Let's say we can use IOMMU_SET_DEV_DATA to communicate the memory address of
> Dom/DevID Mapping tables to IOMMU driver to pin and map in the PAS IOMMU
> page table. Then, this might work. Does that go along the line of what you
> are thinking (mainly to try to avoid introducing additional ioctl)?

I think it makes more sense if memory that is logically part of the
VMM is mmap'd to the VMM. Since we have the general design of passing
user pointers and pinning them it makes some sense. You could do the
same trick as your IPA space and use a IPA IOAS plus an access to set
this all up.

This has the same issue as above, it needs some formal VM object, as
fundamentally you are asking the driver to allocate a limited resource
on a specific IOMMU instance and then link that to other actions.

Jason