RE: [PATCH] x86/fpu: Remove dynamic features from xcomp_bv for init_fpstate

From: Yao, Yuan
Date: Fri Oct 14 2022 - 00:03:52 EST


>-----Original Message-----
>From: Bae, Chang Seok <chang.seok.bae@xxxxxxxxx>
>Sent: Friday, October 14, 2022 00:23
>To: Yao, Yuan <yuan.yao@xxxxxxxxx>; Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx
>Cc: x86@xxxxxxxxxx; Hansen, Dave <dave.hansen@xxxxxxxxx>; Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>Subject: Re: [PATCH] x86/fpu: Remove dynamic features from xcomp_bv for init_fpstate
>
>On 10/12/2022 8:35 PM, Yao, Yuan wrote:
>>
>> The reason is __copy_xstate_to_uabi_buf() copies data from &init_fpstate when the component
>> is not existed in the source kernel fpstate (here is the AMX tile component), but the
>> AMX TILE bit is removed from init_fpstate due to this patch, so the WARN is triggered and return
>> NULL which causes kernel NULL pointer dereference later.
>
>We have this in __copy_xstate_to_uabi_buf() [1]:
>
> mask = fpstate->user_xfeatures;
>
> for_each_extended_xfeature(i, mask) {
> ...
> }
>
>And the KVM code seems to set dynamic features regardless of the buffer
>reallocation [2]:
>
> vcpu->arch.guest_fpu.fpstate->user_xfeatures =
> vcpu->arch.guest_supported_xcr0 | XFEATURE_MASK_FPSSE;
>
>The kernel code seems to be aware of this as fpstate_realloc() does [3]:
>
> if (!guest_fpu)
> newfps->user_xfeatures = curfps->user_xfeatures | xfeatures;
>
>But it updates the 'xfeature' bitmask for all:
>
> newfps->xfeatures = curfps->xfeatures | xfeatures;
>
>So, I think we can do something like this here:
>
>diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
>index c8340156bfd2..8ea7d0e95f1a 100644
>--- a/arch/x86/kernel/fpu/xstate.c
>+++ b/arch/x86/kernel/fpu/xstate.c
>@@ -1127,8 +1127,12 @@ void __copy_xstate_to_uabi_buf(struct membuf to,
>struct fpstate *fpstate,
> * non-compacted format disabled features still occupy state space,
> * but there is no state to copy from in the compacted
> * init_fpstate. The gap tracking will zero these states.
>+ *
>+ * In the case of guest fpstate, this user_xfeatures does not
>+ * dynamically reflect the capacity of the XSAVE buffer but
>+ * xfeatures does. So AND them together.
> */
>- mask = fpstate->user_xfeatures;
>+ mask = fpstate->user_xfeatures & fpstate->xfeatures;

This doesn’t work. At this point KVM already called fpstate_realloc() for guest
fpstate so the dynamic bits already set for the fpstate->xfeature: fpstate->xfeatures is 0x606e7 here.

Also the guest fpstate's xstate_bv (header.xfeature here) is 0 here, so all data will be read from
init_fpstate instead of guest fpstate, which triggered this for reading AMX TILE component.

To keep using init_fpstate as "fallback" for reading component data in above case, changes like
below should work, but this removes the valuable WARN_ON_ONCE from __raw_xsae_addr():

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index f9f45610c72f..1471de470b58 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -941,7 +941,7 @@ static void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
return NULL;

if (cpu_feature_enabled(X86_FEATURE_XCOMPACTED)) {
- if (WARN_ON_ONCE(!(xcomp_bv & BIT_ULL(xfeature_nr))))
+ if (!(xcomp_bv & BIT_ULL(xfeature_nr)))
return NULL;
}

@@ -1049,7 +1049,10 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
static void copy_feature(bool from_xstate, struct membuf *to, void *xstate,
void *init_xstate, unsigned int size)
{
- membuf_write(to, from_xstate ? xstate : init_xstate, size);
+ if ((from_xstate && xstate) || (!from_xstate && init_xstate))
+ membuf_write(to, from_xstate ? xstate : init_xstate, size);
+ else
+ membuf_zero(to, size);
}

>
>Let me also test this by running KVM.
>
>Thanks,
>Chang
>
>[1]
>https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/fpu/xstate.c#n1131
>[2]
>https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kvm/cpuid.c#n346
>[3]
>https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/fpu/xstate.c#n1448