Re: [PATCH] KVM: x86: Inject #UD on "unsupported" hypercall if patching fails

From: Paolo Bonzini
Date: Fri Dec 10 2021 - 17:41:36 EST


On 12/10/21 23:29, Sean Christopherson wrote:
Inject a #UD if patching in the correct hypercall fails, e.g. due to
emulator_write_emulated() failing because RIP is mapped not-writable by
the guest. The guest is likely doomed in any case, but observing a #UD
in the guest is far friendlier to debug/triage than a !WRITABLE #PF with
CR2 pointing at the RIP of the faulting instruction.

Ideally, KVM wouldn't patch at all; it's the guest's responsibility to
identify and use the correct hypercall instruction (VMCALL vs. VMMCALL).
Sadly, older Linux kernels prior to commit c1118b3602c2 ("x86: kvm: use
alternatives for VMCALL vs. VMMCALL if kernel text is read-only") do the
wrong thing and blindly use VMCALL, i.e. removing the patching would
break running VMs with older kernels.

One could argue that KVM should be "fixed" to ignore guest paging
protections instead of injecting #UD, but patching in the first place was
a mistake as it was a hack-a-fix for a guest bug.

Sort of. I agree that patching is awful, but I'm not sure about injecting #UD vs. just doing the hypercall; the original reason for the patching was to allow Intel<->AMD cross-vendor migration to work somewhat.

That in turn promoted Linux's ill-conceived sloppiness of just using vmcall, which lasted until commit c1118b3602c2.

There are myriad fatal
issues with KVM's patching:

1. Patches using an emulated guest write, which will fail if RIP is not
mapped writable. This is the issue being mitigated.

2. Doesn't ensure the write is "atomic", e.g. a hypercall that splits a
page boundary will be handled as two separate writes, which means
that a partial, corrupted instruction can be observed by a vCPU.

Only the third bytes differs between VMCALL and VMMCALL so that's not really a problem. (Apparently what happened is that Microsoft asked Intel to use 0xc1 like AMD, and VMware asked AMD to use 0xd9 like Intel, or something like that; and they ended up swapping opcodes. But this may be an urban legend, no matter how plausible).

The big ones are 1 and 4.

Thanks,

Paolo

3. Doesn't serialize other CPU cores after updating the code stream.

4. Completely fails to account for the case where KVM is emulating due
to invalid guest state with unrestricted_guest=0. Patching and
retrying the instruction will result in vCPU getting stuck in an
infinite loop.

But, the "support" _so_ awful, especially #1, that there's practically
zero chance that a modern guest kernel can rely on KVM to patch the guest.
So, rather than proliferate KVM's bad behavior any further than the
absolute minimum needed for backwards compatibility, just try to make it
suck a little less.