Re: [PATCH v10 09/28] x86/fpu/xstate: Introduce helpers to manage the XSTATE buffer dynamically

From: Bae, Chang Seok
Date: Sun Oct 03 2021 - 18:36:59 EST


On Oct 1, 2021, at 07:20, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> On Wed, Aug 25 2021 at 08:53, Chang S. Bae wrote:
>> +/**
>> + * calculate_xstate_buf_size_from_mask - Calculate an xstate buffer size
>> + * @mask: A bitmap to tell which components to be saved in the buffer.
>> + *
>> + * Available once those arrays for the offset, size, and alignment info are
>> + * set up, by setup_xstate_features().
>> + *
>> + * Returns: The buffer size
>> + */
>> +unsigned int calculate_xstate_buf_size_from_mask(u64 mask)
>> +{
>> + unsigned int size = FXSAVE_SIZE + XSAVE_HDR_SIZE;
>> + int i, nr;
>> +
>> + if (!mask)
>> + return 0;
>> +
>> + /*
>> + * The minimum buffer size excludes the dynamic user state. When a
>> + * task uses the state, the buffer can grow up to the max size.
>> + */
>> + if (mask == (xfeatures_mask_all & ~xfeatures_mask_user_dynamic))
>> + return fpu_buf_cfg.min_size;
>> + else if (mask == xfeatures_mask_all)
>> + return fpu_buf_cfg.max_size;
>> +
>> + nr = fls64(mask) - 1;
>> + if (nr < FIRST_EXTENDED_XFEATURE)
>> + return size;
>> +
>> + /*
>> + * Each state offset in the non-compacted format is fixed. Take the
>> + * size from the last feature 'nr'.
>> + */
>> + if (!cpu_feature_enabled(X86_FEATURE_XSAVES))
>> + return xstate_offsets[nr] + xstate_sizes[nr];
>> +
>> + /*
>> + * With the given mask, no relevant size is found so far. So,
>> + * calculate it by summing up each state size.
>> + */
>> + for (i = FIRST_EXTENDED_XFEATURE; i <= nr; i++) {
>> + if (!(mask & BIT_ULL(i)))
>> + continue;
>> +
>> + if (xstate_64byte_aligned[i])
>> + size = ALIGN(size, 64);
>> + size += xstate_sizes[i];
>> + }
>> + return size;
>> +}
>
> So we have yet another slightly different function to calculate the
> buffer size. Why do we still need calculate_xstate_size()?

do_extra_xstate_size_checks() (renamed as calculate_xstate_size() in this
series) does calculate the XSTATE size. But it refers to CPUID as before
setup_xstate_features() which records offset, size, and alignment info. The
above hunk references the stored values at runtime.

Although I could aggressively try to combine them, I thought not that cleanup
is necessary for this series. But perhaps I may follow-up this as cleanup patch.

>> +void free_xstate_buffer(struct fpu *fpu)
>> +{
>
> Can you please put the check:
>
>> + if (fpu->state != &fpu->__default_state)
>
> into this function? If it is ever called without checking it and state
> points at fpu->__default_state then the explosions are going to be
> interesting.
>
>> + vfree(fpu->state);
>> +}

I wrote that originally. But it has been changed like this, as per Boris’:

>> +void free_xstate_buffer(struct fpu *fpu)
>> +{
>> + /* Free up only the dynamically-allocated memory. */

>This belongs above the function along with an explanation when it needs
>to be called.

>> + if (fpu->state != &fpu->__default_state)
>> + vfree(fpu->state);
>> +}


It seems that I missed comments on those call sites. But I would like to take
your point.

>> +/**
>> + * realloc_xstate_buffer - Re-alloc a buffer with the size calculated from
>> + * @mask.
>> + *
>> + * @fpu: A struct fpu * pointer
>> + * @mask: The bitmap tells which components to be reserved in the new
>> + * buffer.
>> + *
>> + * It deals with enlarging the xstate buffer with dynamic states.
>> + *
>> + * Use vzalloc() simply here. If the task with a vzalloc()-allocated buffer
>> + * tends to terminate quickly, vfree()-induced IPIs may be a concern.
>> + * Caching may be helpful for this. But the task with large state is likely
>> + * to live longer.
>> + *
>> + * Also, this method does not shrink or reclaim the buffer.
>> + *
>> + * Returns 0 on success, -ENOMEM on allocation error.
>> + */
>> +int realloc_xstate_buffer(struct fpu *fpu, u64 mask)
>> +{
>> + union fpregs_state *state;
>> + u64 state_mask;
>> +
>> + state_mask = fpu->state_mask | mask;
>> + if ((state_mask & fpu->state_mask) == state_mask)
>> + return 0;
>> +
>> + state = vzalloc(calculate_xstate_buf_size_from_mask(state_mask));
>> + if (!state)
>> + return -ENOMEM;
>> +
>> + /*
>> + * As long as the register state is intact, save the xstate in the
>> + * new buffer at the next context switch or ptrace's context
>> + * injection.
>
> What exactly guarantees that current's xstate is valid in the hardware
> registers? This has to be fully preemptible context, otherwise you could
> not invoke vzalloc() which can sleep.
>
> Which in turn means that the placement of the exception fixup in a later
> patch is broken:
>
>> DEFINE_IDTENTRY(exc_device_not_available)
>> {
>> unsigned long cr0 = read_cr0();
>>
>> + if (handle_xfd_event(&current->thread.fpu, regs))
>> + return;
>
> And no, we are not going to use an atomic allocation for this.
>
> For the other call site from xstateregs_set() the FPU register state is
> definitely not live because the task is stopped and the FPU registers
> if live belong to the ptracer.
>
> So you really want something like this:
>
> static struct fpregs_state *swap_fpstate(fpu, newstate, mask, size)
> {
> old_state = fpu->state;
> fpu->state = newstate;
> fpu->state_mask = mask;
> fpu->state_size = size;
> return old_state != fpu->__default_state ? old_state : NULL;
> }
>
> int realloc_xstate_buffer(struct task_struct *tsk, u64 mask)
> {
> old_state;
> fpu = tsk->fpu;
>
> size = calc_size(state_mask);
> state = vzalloc(size);
> if (!state)
> return -ENOMEM;
>
> if (cpu_feature_enabled(X86_FEATURE_XSAVES))
> fpstate_init_xstate(&state->xsave, state_mask);
>
> if (tsk != current) {
> // PTRACE ....
> old_state = swap_fpstate(fpu, state, state_mask, size);
> } else {
> fpregs_lock();
> if (!registers_valid())
> copy_state(state, fpu->state);
> old_state = swap_fpstate(fpu, state, state_mask, size);
> fpregs_unlock();
> }
>
> vfree(old_state);
> return 0;
> }
>
> And the exception fixup has to move into the irq enabled region. I come
> to that patch later.

Okay, IIUC, (1) vzalloc() can sleep, (2) the call site has to be preemptible,
thus (3) lock register states.

Thanks for the code.

>> @@ -1147,6 +1268,8 @@ static int copy_uabi_to_xstate(struct fpu *fpu, const void *kbuf,
>> if (validate_user_xstate_header(&hdr))
>> return -EINVAL;
>>
>> + hdr.xfeatures &= fpu->state_mask;
>
> This is really the wrong patch doing this. This belongs into the one
> which introduces fpu->state_mask.

The state_mask field was introduced here:

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index ad5cbf922e30..0cc9f6c5a10c 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -336,6 +336,13 @@ struct fpu {
*/
unsigned long avx512_timestamp;

+ /*
+ * @state_mask:
+ *
+ * The bitmap represents state components to be saved in ->state.
+ */
+ u64 state_mask;

But as you suggested in the other email, I’m going to create a new patch that
introduces new mask and state fields in struct fpu and add this.

Thanks,
Chang