Re: [PATCH v6 2/4] KVM: x86: report negative values from wrmsr emulation to userspace

From: Maxim Levitsky
Date: Wed Oct 28 2020 - 20:54:48 EST


On Tue, 2020-10-27 at 16:31 -0400, Qian Cai wrote:
> On Mon, 2020-10-26 at 15:40 -0400, Qian Cai wrote:
> > On Wed, 2020-09-23 at 00:10 +0300, Maxim Levitsky wrote:
> > > This will allow the KVM to report such errors (e.g -ENOMEM)
> > > to the userspace.
> > >
> > > Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
> >
> > Reverting this and its dependency:
> >
> > 72f211ecaa80 KVM: x86: allow kvm_x86_ops.set_efer to return an error value
> >
> > on the top of linux-next (they have also unfortunately merged into the
> > mainline
> > at the same time) fixed an issue that a simple Intel KVM guest is unable to
> > boot
> > below.
>
> So I debug this a bit more. This also breaks nested virt (VMX). We have here:
>
> [ 345.504403] kvm [1491]: vcpu0 unhandled rdmsr: 0x4e data 0x0
> [ 345.758560] kvm [1491]: vcpu0 unhandled rdmsr: 0x1c9 data 0x0
> [ 345.758594] kvm [1491]: vcpu0 unhandled rdmsr: 0x1a6 data 0x0
> [ 345.758619] kvm [1491]: vcpu0 unhandled rdmsr: 0x1a7 data 0x0
> [ 345.758644] kvm [1491]: vcpu0 unhandled rdmsr: 0x3f6 data 0x0
> [ 345.951601] kvm [1493]: vcpu1 unhandled rdmsr: 0x4e data 0x0
> [ 351.857036] kvm [1493]: vcpu1 unhandled wrmsr: 0xc90 data 0xfffff
>
> After this commit, -ENOENT is returned to vcpu_enter_guest() causes the
> userspace to abort.

Thank you very much for debugging it!

Yestarday I tried pretty much everything to reproduce it on my intel's laptop
but I wasn't able.

I tried kvm/queue, then latest mainline, linux-next on my laptop
and all worked and even booted nested guests.

For qemu side,
I even tried RHEL's qemu, exact version as you tested.

I got really unlucky here that it seems that none of my guests ever write
an unknown msr.
Now with the information you provided, it is trivial to reproduce it
even on my AMD machine -
All I need to do is to write an unknown msr,
something like 'wrmsr 0x1234 0x0' using wrmsr tool.

And for the root cause of this, this is the fallout of last minute rebase of my code on top
of the userspace msr feature. I missed this -ENOENT logic that clashes with mine.

>
> kvm_msr_ignored_check()
> kvm_set_msr()
> kvm_emulate_wrmsr()
> vmx_handle_exit()
> vcpu_enter_guest()
>
> Something like below will unbreak the userspace, but does anyone has a better
> idea?

Checking for -ENOENT might be a right solution but I'll check now in depth,
what else interactions are affected and if this can be done closer to the
place where it happens.

Also, I am more inclined to add a new positive error code (similiar to KVM_MSR_RET_INVALID)
to indicate 'msr not found' error since this error condition is arch defined.

My reasoning is that positive error values should be used for error conditions
that cause #GP to the guest, while negative error values should only be used
for internal errors which are propogated to qemu userspace and usually kill
the guest.

I'll prepare a patch for this very soon.

Best regards,
Maxim Levitsky


>
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1748,7 +1748,7 @@ int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
> return 0;
>
> /* Signal all other negative errors to userspace */
> - if (r < 0)
> + if (r < 0 && r != -ENOENT)
> return r;
>
> /* MSR write failed? Inject a #GP */
>