Re: [PATCH 15/19] kvm: x86: Save and restore guest XFD_ERR properly

From: Thomas Gleixner
Date: Fri Dec 10 2021 - 19:10:54 EST


On Tue, Dec 07 2021 at 19:03, Yang Zhong wrote:
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index 5089f2e7dc22..9811dc98d550 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -238,6 +238,7 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
> fpstate->is_guest = true;
>
> gfpu->fpstate = fpstate;
> + gfpu->xfd_err = XFD_ERR_GUEST_DISABLED;

This wants to be part of the previous patch, which introduces the field.

> gfpu->user_xfeatures = fpu_user_cfg.default_features;
> gfpu->user_perm = fpu_user_cfg.default_features;
> fpu_init_guest_permissions(gfpu);
> @@ -297,6 +298,7 @@ int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest)
> fpu->fpstate = guest_fps;
> guest_fps->in_use = true;
> } else {
> + fpu_save_guest_xfd_err(guest_fpu);

Hmm. See below.

> guest_fps->in_use = false;
> fpu->fpstate = fpu->__task_fpstate;
> fpu->__task_fpstate = NULL;
> @@ -4550,6 +4550,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> kvm_steal_time_set_preempted(vcpu);
> srcu_read_unlock(&vcpu->kvm->srcu, idx);
>
> + if (vcpu->preempted)
> + fpu_save_guest_xfd_err(&vcpu->arch.guest_fpu);

I'm not really exited about the thought of an exception cause register
in guest clobbered state.

Aside of that I really have to ask the question why all this is needed?

#NM in the guest is slow path, right? So why are you trying to optimize
for it?

The straight forward solution to this is:

1) Trap #NM and MSR_XFD_ERR write

2) When the guest triggers #NM is takes an VMEXIT and the host
does:

rdmsrl(MSR_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);

injects the #NM and goes on.

3) When the guest writes to MSR_XFD_ERR it takes an VMEXIT and
the host does:

vcpu->arch.guest_fpu.xfd_err = msrval;
wrmsrl(MSR_XFD_ERR, msrval);

and goes back.

4) Before entering the preemption disabled section of the VCPU loop
do:

if (vcpu->arch.guest_fpu.xfd_err)
wrmsrl(MSR_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);

5) Before leaving the preemption disabled section of the VCPU loop
do:

if (vcpu->arch.guest_fpu.xfd_err)
wrmsrl(MSR_XFD_ERR, 0);

It's really that simple and pretty much 0 overhead for the regular case.

If the guest triggers #NM with a high frequency then taking the VMEXITs
is the least of the problems. That's not a realistic use case, really.

Hmm?

Thanks,

tglx