Re: [PATCH] x86/mce: Check for hypervisor before enabling additional error logging

From: Paolo Bonzini
Date: Tue Nov 10 2020 - 03:50:53 EST


On 10/11/20 07:31, Borislav Petkov wrote:
+ if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
+ return;
+
Frankly, I'm tired of wagging the dog because the tail can't. If
qemu/kvm can't emulate a CPU model fully then it should ignore those
unknown MSR accesses by default, i.e., that "ignore_msrs" functionality
should be on by default I'd say...

We certainly can't be sprinkling this check everytime the kernel tries
to do something as basic as read an MSR.

You don't have to, also because it's wrong. Fortunately it's much simpler than that:

1) ignore_msrs _cannot_ be on by default. You cannot know in advance that for all non-architectural MSRs it's okay for them to read as zero and eat writes. For some non-architectural MSR which never reads as zero on real hardware, who knows that there isn't some code using the contents of the MSR as a divisor, and causing a division by zero exception with ignore_msrs=1?

2) it's not just KVM. _Any_ hypervisor is bound to have this issue for some non-architectural MSRs. KVM just gets the flak because Linux CI environments (for obvious reasons) use it more than they use Hyper-V or ESXi or VirtualBox.

3) because of (1) and (2), the solution is very simple. If the MSR is architectural, its absence is a KVM bug and we'll fix it in all stable versions. If the MSR is not architectural (and 17Fh isn't; not only it's not mentioned in the SDM, even Google is failing me), never ever assume that the CPUID family/model/stepping implies a given MSR is there, and just use rdmsr_safe/wrmsr_safe.

So, for this patch,

Nacked-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>

Paolo