Re: [PATCH] x86/mce: Check for hypervisor before enabling additional error logging

From: Borislav Petkov
Date: Tue Nov 10 2020 - 04:56:26 EST


On Tue, Nov 10, 2020 at 09:50:43AM +0100, Paolo Bonzini wrote:
> 1) ignore_msrs _cannot_ be on by default. You cannot know in advance that
> for all non-architectural MSRs it's okay for them to read as zero and eat
> writes. For some non-architectural MSR which never reads as zero on real
> hardware, who knows that there isn't some code using the contents of the MSR
> as a divisor, and causing a division by zero exception with ignore_msrs=1?

So if you're emulating a certain type of hardware - say a certain CPU
model - then what are you saying? That you're emulating it but not
really all of it, just some bits?

Because this is what happens - the kernel checks that it runs on a
certain CPU type and this tells it that those MSRs are there. But then
comes virt and throws all assumptions out.

So if it emulates a CPU model and the kernel tries to access those MSRs,
then the HV should ignore those MSR accesses if it doesn't know about
them. Why should the kernel change everytime some tool or virtualization
has shortcomings?

> 2) it's not just KVM. _Any_ hypervisor is bound to have this issue for some
> non-architectural MSRs. KVM just gets the flak because Linux CI
> environments (for obvious reasons) use it more than they use Hyper-V or ESXi
> or VirtualBox.

It's not flak - I'm trying to find a solution which is workable for
both. The kernel is not at fault here.

> 3) because of (1) and (2), the solution is very simple. If the MSR is
> architectural, its absence is a KVM bug and we'll fix it in all stable
> versions. If the MSR is not architectural (and 17Fh isn't; not only it's
> not mentioned in the SDM,

It is mentioned in the SDM.

> even Google is failing me), never ever assume that the CPUID
> family/model/stepping implies a given MSR is there, and just use
> rdmsr_safe/wrmsr_safe.

Yes, we don't have a choice, as always. ;-\

But maybe we should have a choice and maybe qemu/kvm should have a way
to ignore certain MSRs for certain CPU types, regardless of them being
architectural or not.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette