Re: [RFC PATCH 2/3] rseq: extend struct rseq with per thread group vcpu id

From: Florian Weimer
Date: Tue Feb 01 2022 - 16:30:25 EST


* Mathieu Desnoyers:

> ----- On Feb 1, 2022, at 3:32 PM, Florian Weimer fw@xxxxxxxxxxxxx wrote:
> [...]
>>
>>>> Is the switch really useful? I suspect it's faster to just write as
>>>> much as possible all the time. The switch should be well-predictable
>>>> if running uniform userspace, but still …
>>>
>>> The switch ensures the kernel don't try to write to a memory area beyond
>>> the rseq size which has been registered by user-space. So it seems to be
>>> useful to ensure we don't corrupt user-space memory. Or am I missing your
>>> point ?
>>
>> Due to the alignment, I think you'd only ever see 32 and 64 bytes for
>> now?
>
> Yes, but I would expect the rseq registration arguments to have a rseq_len
> of offsetofend(struct rseq, tg_vcpu_id) when userspace wants the tg_vcpu_id
> feature to be supported (but not the following features).

But if rseq is managed by libc, it really has to use the full size
unconditionally. I would even expect that eventually, the kernel only
supports the initial 32, maybe 64 for a few early extension, and the
size indicated by the auxiliary vector.

Not all of that area would be ABI, some of it would be used by the
vDSO only and opaque to userspace application (with applications/libcs
passing __rseq_offset as an argument to these functions).

>> I'd appreciate if you could put the maximm supported size and possibly
>> the alignment in the auxiliary vector, so that we don't have to rseq
>> system calls in a loop on process startup.
>
> Yes, it's a good idea. I'm not too familiar with the auxiliary vector.
> Are we talking about the kernel's
>
> fs/binfmt_elf.c:fill_auxv_note()
>
> ?

Indeed.