Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

From: Mathieu Desnoyers
Date: Mon Jul 02 2018 - 21:19:09 EST


----- On Jul 2, 2018, at 7:37 PM, Andy Lutomirski luto@xxxxxxxxxxxxxx wrote:

>> On Jul 2, 2018, at 4:22 PM, Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
>> wrote:
>>
>> ----- On Jul 2, 2018, at 7:16 PM, Mathieu Desnoyers
>> mathieu.desnoyers@xxxxxxxxxxxx wrote:
>>
>>> ----- On Jul 2, 2018, at 7:06 PM, Linus Torvalds torvalds@xxxxxxxxxxxxxxxxxxxx
>>> wrote:
>>>
>>>> On Mon, Jul 2, 2018 at 4:00 PM Mathieu Desnoyers
>>>> <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>>>>>
>>>>> Unfortunately, that rseq->rseq_cs field needs to be updated by user-space
>>>>> with single-copy atomicity. Therefore, we want 32-bit user-space to initialize
>>>>> the padding with 0, and only update the low bits with single-copy atomicity.
>>>>
>>>> Well... It's actually still single-copy atomicity as a 64-bit value.
>>>>
>>>> Why? Because it doesn't matter how you write the upper bits. You'll be
>>>> writing the same value to them (zero) anyway.
>>>>
>>>> So who cares if the write ends up being two instructions, because the
>>>> write to the upper bits doesn't actually *do* anything.
>>>>
>>>> Hmm?
>>>
>>> Are there any kind of guarantees that a __u64 update on a 32-bit architecture
>>> won't be torn into something daft like byte-per-byte stores when performed
>>> from C code ?
>>>
>>> I don't worry whether the upper bits get updated or how, but I really care
>>> about not having store tearing of the low bits update.
>>
>> For the records, most updates of those low bits are done in assembly
>> from critical sections, for which we control exactly how the update is
>> performed.
>>
>> However, there is one helper function in user-space that updates that value
>> from C through a volatile store, e.g.:
>>
>> static inline void rseq_prepare_unload(void)
>> {
>> __rseq_abi.rseq_cs = 0;
>> }
>
> How about making the field be:
>
> union {
> __u64 rseq_cs;
> struct {
> __u32 rseq_cs_low;
> __u32 rseq_cs_high;
> };
> };
>
> 32-bit user code that cares about performance can just write to rseq_cs_low
> because it already knows that rseq_cs_high == 0.
>
> The header could even supply a static inline helper write_rseq_cs() that
> atomically writes a pointer and just does the right thing for 64-bit, for
> 32-bit BE, and for 32-bit LE.
>
> I think the union really is needed because we canât rely on user code being
> built with -fno-strict-aliasing. Or the helper could use inline asm.
>
> Anyway, the point is that we get optimal code generation (a single instruction
> write of the correct number of bits) without any compat magic in the kernel.

That works for me! Any objection from anyone else for this approach ?

Thanks,

Mathieu

>
>>
>> Thanks,
>>
>> Mathieu
>>
>>>
>>> Thanks,
>>>
>>> Mathieu
>>>
>>>
>>> --
>>> Mathieu Desnoyers
>>> EfficiOS Inc.
>>> http://www.efficios.com
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
> > http://www.efficios.com

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com