Re: [PATCH v3 09/16] iommu: Introduce guest PASID bind function

From: Jacob Pan
Date: Wed May 22 2019 - 13:15:20 EST


On Wed, 22 May 2019 16:05:53 +0100
Jean-Philippe Brucker <jean-philippe.brucker@xxxxxxx> wrote:

> On 21/05/2019 23:50, Jacob Pan wrote:
> >>> /**
> >>> * struct gpasid_bind_data - Information about device and guest
> >>> PASID binding
> >>> * @version: Version of this data structure
> >>> * @format: PASID table entry format
> >>> * @flags: Additional information on guest bind request
> >>> * @gpgd: Guest page directory base of the guest mm to bind
> >>> * @hpasid: Process address space ID used for the guest mm
> >>> in host IOMMU
> >>> * @gpasid: Process address space ID used for the guest mm
> >>> in guest IOMMU
> >>
> >> Trying to understand the full flow:
> >> * @gpasid is the one allocated by the guest using a virtual
> >> command. The guest writes @gpgd into the virtual PASID table at
> >> index @gpasid, then sends an invalidate command to QEMU.
> > yes
> >> * QEMU issues a gpasid_bind ioctl (on the mdev or its container?).
> >> VFIO forwards. The IOMMU driver installs @gpgd into the PASID table
> >> using @hpasid, which is associated with the auxiliary domain.
> >>
> >> But why do we need the @hpasid field here? Does userspace know
> >> about it at all, and does VFIO need to pass it to the IOMMU driver?
> >>
> > We need to support two guest-host PASID mappings through this API.
> > Idea comes from Kevin & Yi.
> > 1. identity mapping between host and guest PASID
> > 2. guest owns its own pasid space
> >
> > For option 1, which will plan to support first in this series.
> > There is no need for gpasid field since gpasid=hpasid. Guest
> > allocates PASID using virtual command interface which gets a host
> > PASID. Then PASID cache invalidation in the guest will result in
> > bind_gpasid(), @gpasid is not valid in the bind data (indicated by
> > the IOMMU_SVA_GPASID_VAL flag).
> >
> > For option 2, guest still uses virtual command to allocate guest
> > pasid, but this time QEMU does the allocation for gpasid, at the
> > same time QEMU will allocate a host pasid then maintain a G->H
> > PASID lookup. When guest invalidate its PASID cache with GPASID,
> > QEMU will find the match host PASID then pass both gpasid and
> > hpasid down to the host IOMMU driver.
> > Host IOMMU driver will store the gpgd at the hpasid entry but keep
> > track of the gpasid->hpasid mapping. Host will never program gpasid
> > in the IOMMU HW. Host IOMMU driver provides G->H PASID translation
> > for PF device drivers that emulates mdev config space, i.e. virtual
> > device composition module
> > (https://events.linuxfoundation.org/wp-content/uploads/2017/12/Hardware-Assisted-Mediated-Pass-Through-with-VFIO-Kevin-Tian-Intel.pdf).
> >
> > These two options is a per VM choice. Hopefully the two diagrams
> > below can help to explain. I will put them in the next patch
> > headers.
>
> Thanks for the explanation, makes sense to me now. So the host kernel
> needs to know G->H because the guest may write GPASID into the config
> space emulated by the host device driver, and device driver then
> retrieves the HPASID via an iommu_ops callback? But the device driver
> keeps track of aux domains so isn't HPASID retrievable with
> aux_get_pasid() already?
>
aux_get_pasid() will get domain's default pasid, which is used for
non-svm traffic on mdev. Here the gpasid bind is for svm only.
> >
> > Option 1. Identity G-H PASID mapping diagram.
> >
> > .-------------. .---------------------------.
> > | vIOMMU | | Guest process mm, FL only |
> > | | '---------------------------'
> > .----------------/
> > | PASID Entry |--- PASID cache flush -
> > '-------------'\ |
> > | | \ |
> > | | \ |
> > '-------------' \________________ |
> > GPASID = HPASID |
> > Guest ^ ^ |
> > ------| Shadow |-------| VCMD |-----------|------------
> > v v | | |
> > QEMU v v |
> > ------------------------------------------|------------
> > Host HPASID = ioasid_alloc() |
> > | v
> > | sva_bind_gpasid(HPASID)
> > |
> > .-------------. | .----------------------.
> > | pIOMMU | | | Bind FL for GVA-GPA |
> > | | | /'----------------------'
> > .----------------' |
> > | PASID Entry | V (Nested xlate)
> > '----------------..---------------------.
> > | | |Set SL to GPA-HPA |
> > | | '---------------------'
> > '-------------'
> >
> >
> >
> > Option 2. Non-identity G-H PASID mapping diagram.
> >
> > .-------------. .---------------------------.
> > | vIOMMU | | Guest process mm, FL only |
> > | | '---------------------------'
> > .----------------/
> > | PASID Entry |--- PASID cache flush -
> > '-------------'\ | .-------------.
> > | | \ | |Guest driver |
> > | | \ | |writes GPASID|
> > '-------------' \________________ | '-------------'
> > GPASID | |
> > Guest ^ ^ | |
> > ------| Shadow |-------| VCMD |-----------|------------ |
> > v v | | | |
> > QEMU v v | |
> > GPASID = qemu_gpasid_alloc() | |
> > keep G->H PASID lookup | |
> > ^ v |
> > | lookup G->H PASID |
> > -------------------|----------------------|------------ |
> > Host HPASID = ioasid_alloc() | |
> > | v |
> > | sva_bind_gpasid(HPASID,GPASID)|
> > | keep H-G PASID lookup |
> > | \
> > -------------------. .-------------. | .----------------------.
> > \| VDCM | | pIOMMU | | | Bind FL for GVA-GPA |
> > | H = lookup(GPASID)| | | | /'----------------------'
> > | write H to dev | .----------------' |
> > '------------------' | PASID Entry | V (Nested xlate)
> > '----------------..---------------------.
> > | | |Set SL to GPA-HPA |
> > | | '---------------------'
> > '-------------'
> > There is also implications in G-H pasid lookup for PRQ, that would
> > be in the later series.
> >
> >>> * @addr_width: Guest address width. Paging mode can also
> >>> be derived.
> >>
> >> What does the last sentence mean? @addr_width should probably be in
> >> @vtd if it provides implicit information.
> >>
> > Derive 4 or 5 level paging mode from the address width. It can be in
> > @vtd but i thought this can be generic.
>
> Yes I think it's generic enough. It may be worth stating that this is
> the *virtual* address width, and removing or clarifying what the
> paging mode is (the sentence could be confusing on Arm, as we have
> different page granules which cannot be derived from the address
> width)
>
OK, will keep addr_width as a generic field, then remove the paging
mode comment.

Thanks,

Jacob