Re: [PATCH v4 06/17] PCI: add SIOV and IMS capability detection

From: Raj, Ashok
Date: Fri Nov 13 2020 - 12:38:50 EST


On Fri, Nov 13, 2020 at 08:12:39AM -0800, Luck, Tony wrote:
> > Of course is this not only an x86 problem. Every architecture which
> > supports virtualization has the same issue. ARM(64) has no way to tell
> > for sure whether the machine runs bare metal either. No idea about the
> > other architectures.
>
> Sounds like a hypervisor problem. If the VMM provides perfect emulation
> of every weird quirk of h/w, then it is OK to let the guest believe that it is
> running on bare metal.

That's true, which is why there isn't an immutable bit in cpuid or
otherwise telling you are running under a hypervisor. Providing something
like that would make certain features not virtualizable. Apparently before we
had faulting cpuid, what you had in guest was the real raw cpuid.

Waiver: I'm not saying this is perfect, I'm just replaying the reason
behind it. Not trying to defend it... flames > /dev/null
>
> If it isn't perfect, then it should make sure the guest knows *for sure*, so that
> the guest can take appropriate actions to avoid the sharp edges.
>

There are indeed 2 problems to solve.

1. How does device driver know if device is IMS capable.

IMS is a device attribute. Each vendor can provide its own method to
provide that indication. One such mechanism is the DVSEC.SIOV.IMS
property. Some might believe this is for use only by Intel. For DVSEC I
don't believe there is such a connection as in device vendor id in
standard header. TBH, there are other device vendors using the exact
same method to indicate SIOV and IMS propeties. What a DVSEC vendor ID
states is "As defined by Vendor X".

Why we choose a config vs something in device specific mmio is because
today VFIO being that one common mechanism, it only exposes known
standard and some extended headers to guest. When we expose a full PF,
the guest doens't see the DVSEC, so drivers know this isn't available.

This is our mechanism to stop drivers from calling
pci_ims_array_create_msi_irq_domain(). It may not be perfect for all
devices, it is a device specific mechanism. For devices under
consideration following the SIOV spec it meets the sprit of the
requirement even without #2 below. When devices have no way to detect
this, #2 is required as a second way to block IMS.

2. How does platform component (IOMMU) inform if they can support all forms
of IMS. (On device, or in memory).

On device would require some form trap/emulate. Legacy MSIx already has
that solved, but for device specific store you need some additional
work.

When its system memory (say IMS is in GPA space), you need some form of
hypercall. There is no way around it since we can't intercept. Yes, you
can maybe map those as RO and trap, but its not pretty.

To solve this rather than a generic platform capability, maybe we should
flip this to IOMMU instead, because that's the one that offers this
capability today.

iommu_ims_supported()
When platform has no IOMMU or no hypervisor calls, it returns
false. So device driver can tell, even if it supports IMS
capability deduction, does the platform support IMS.

On platforms where iommu supports capability.

Either there is a vIOMMU with a Virtual Command Register that can
provide a way to get the interrupt handle similar to what you would
get from an hypercall for instance. Or there is a real hypercall
that supports giving the guest OS the physical IRTE handle.


--
Cheers,
Ashok

[Forgiveness is the attribute of the STRONG - Gandhi]