RE: [PATCH v4 06/17] PCI: add SIOV and IMS capability detection

From: Tian, Kevin
Date: Thu Nov 12 2020 - 21:42:10 EST


> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Sent: Friday, November 13, 2020 6:43 AM
>
> On Thu, Nov 12 2020 at 14:32, Konrad Rzeszutek Wilk wrote:
> >> 4. Using CPUID to detect running as guest. But as Thomas pointed out, this
> >> approach is less reliable as not all hypervisors do this way.
> >
> > Is that truly true? It is the first time I see the argument that extra
> > steps are needed and that checking for X86_FEATURE_HYPERVISOR is not
> enough.
> >
> > Or is it more "Some hypervisor probably forgot about it, so lets make sure
> we patch
> > over that possible hole?"
>
> Nothing enforces that bit to be set. The bit is a pure software
> convention and was proposed by VMWare in 2008 with the following
> changelog:
>
> "This patch proposes to use a cpuid interface to detect if we are
> running on an hypervisor.
>
> The discovery of a hypervisor is determined by bit 31 of CPUID#1_ECX,
> which is defined to be "hypervisor present bit". For a VM, the bit is
> 1, otherwise it is set to 0. This bit is not officially documented by
> either Intel/AMD yet, but they plan to do so some time soon, in the
> meanwhile they have promised to keep it reserved for virtualization."
>
> The reserved promise seems to hold. AMDs APM has it documented. The
> Intel SDM not so.
>
> Also the kernel side of KVM does not enforce that bit, it's up to the user
> space management to set it.
>
> And yes, I've tripped over this with some hypervisors and even qemu KVM
> failed to set it in the early days because it was masked with host CPUID
> trimming as there the bit is obviously 0.
>
> DMI vendor name is pretty good final check when the bit is 0. The
> strings I'm aware of are:
>
> QEMU, Bochs, KVM, Xen, VMware, VMW, VMware Inc., innotek GmbH,
> Oracle
> Corporation, Parallels, BHYVE, Microsoft Corporation
>
> which is not complete but better than nothing ;)
>
> Thanks,
>
> tglx

Hi, Thomas,

CPUID#1_ECX is a x86 thing. Do we need to figure out probably_on_
bare_metal for every architecture altogether, or is it OK to just
handle it for x86 arch at this stage? Based on previous discussions
ims is just one piece of multiple technologies to enable SIOV-like
scalability. Ideally arch-specific enablement beyond ims (e.g. the
IOMMU part) will be required for such scaled usage thus we
may just leave ims disabled for non-x86 and wait until that time to
figure out arch specific probably_on_bare_metal?

Thanks
Kevin