Re: [PATCH v19 080/130] KVM: TDX: restore host xsave state when exit from the guest TD

From: Chen Yu
Date: Sat Mar 09 2024 - 11:29:28 EST


On 2024-03-08 at 12:58:38 -0800, Isaku Yamahata wrote:
> On Thu, Mar 07, 2024 at 04:32:16PM +0800,
> Chen Yu <yu.c.chen@xxxxxxxxx> wrote:
>
> > On 2024-02-26 at 00:26:22 -0800, isaku.yamahata@xxxxxxxxx wrote:
> > > From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> > >
> > > On exiting from the guest TD, xsave state is clobbered. Restore xsave
> > > state on TD exit.
> > >
> > > Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> > > ---
> > > v19:
> > > - Add EXPORT_SYMBOL_GPL(host_xcr0)
> > >
> > > v15 -> v16:
> > > - Added CET flag mask
> > >
> > > Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> > > ---
> > > arch/x86/kvm/vmx/tdx.c | 19 +++++++++++++++++++
> > > arch/x86/kvm/x86.c | 1 +
> > > 2 files changed, 20 insertions(+)
> > >
> > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > > index 9616b1aab6ce..199226c6cf55 100644
> > > --- a/arch/x86/kvm/vmx/tdx.c
> > > +++ b/arch/x86/kvm/vmx/tdx.c
> > > @@ -2,6 +2,7 @@
> > > #include <linux/cpu.h>
> > > #include <linux/mmu_context.h>
> > >
> > > +#include <asm/fpu/xcr.h>
> > > #include <asm/tdx.h>
> > >
> > > #include "capabilities.h"
> > > @@ -534,6 +535,23 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> > > */
> > > }
> > >
> > > +static void tdx_restore_host_xsave_state(struct kvm_vcpu *vcpu)
> > > +{
> > > + struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> > > +
> > > + if (static_cpu_has(X86_FEATURE_XSAVE) &&
> > > + host_xcr0 != (kvm_tdx->xfam & kvm_caps.supported_xcr0))
> > > + xsetbv(XCR_XFEATURE_ENABLED_MASK, host_xcr0);
> > > + if (static_cpu_has(X86_FEATURE_XSAVES) &&
> > > + /* PT can be exposed to TD guest regardless of KVM's XSS support */
> > > + host_xss != (kvm_tdx->xfam &
> > > + (kvm_caps.supported_xss | XFEATURE_MASK_PT | TDX_TD_XFAM_CET)))
> > > + wrmsrl(MSR_IA32_XSS, host_xss);
> > > + if (static_cpu_has(X86_FEATURE_PKU) &&
> > > + (kvm_tdx->xfam & XFEATURE_MASK_PKRU))
> > > + write_pkru(vcpu->arch.host_pkru);
> > > +}
> >
> > Maybe one minor question regarding the pkru restore. In the non-TDX version
> > kvm_load_host_xsave_state(), it first tries to read the current setting
> > vcpu->arch.pkru = rdpkru(); if this setting does not equal to host_pkru,
> > it trigger the write_pkru on host. Does it mean we can also leverage that mechanism
> > in TDX to avoid 1 pkru write(I guess pkru write is costly than a read pkru)?
>
> Yes, that's the intention. When we set the PKRU feature for the guest, TDX
> module unconditionally initialize pkru.

I see, thanks for the information. Please correct me if I'm wrong, and I'm not sure
if wrpkru instruction would trigger the TD exit. The TDX module spec[1] mentioned PKS
(protected key for supervisor pages), but does not metion PKU for user pages. PKS
is controlled by MSR IA32_PKRS. The TDX module will passthrough the MSR IA32_PKRS
write in TD, because TDX module clears the PKS bitmap in VMCS:
https://github.com/intel/tdx-module/blob/tdx_1.5/src/common/helpers/helpers.c#L1723
so neither write to MSR IA32_PKRS nor wrpkru triggers TD exit.

However, after a second thought, I found that after commit 72a6c08c44e4, the current
code should not be a problem, because write_pkru() would first read the current pkru
settings and decide whether to update to the pkru register.

> Do you have use case that wrpkru()
> (without rdpkru()) is better?

I don't have use case yet. But with/without rdpkru() in tdx_restore_host_xsave_state(),
there is no much difference because write_pkru() has taken care of it if I understand
correctly.

thanks,
Chenyu