Re: [RFC PATCH 1/2] rseq: Implement KTLS prototype for x86-64

From: Florian Weimer
Date: Thu Oct 29 2020 - 11:37:06 EST


* Mathieu Desnoyers:

> ----- On Sep 29, 2020, at 4:13 AM, Florian Weimer fweimer@xxxxxxxxxx wrote:
>
>> * Mathieu Desnoyers:
>>
>>>> So we have a bootstrap issue here that needs to be solved, I think.
>>>
>>> The one thing I'm not sure about is whether the vDSO interface is indeed
>>> superior to KTLS, or if it is just the model we are used to.
>>>
>>> AFAIU, the current use-cases for vDSO is that an application calls into
>>> glibc, which then calls the vDSO function exposed by the kernel. I wonder
>>> whether the vDSO indirection is really needed if we typically have a glibc
>>> function used as indirection ? For an end user, what is the benefit of vDSO
>>> over accessing KTLS data directly from glibc ?
>>
>> I think the kernel can only reasonably maintain a single userspace data
>> structure. It's not reasonable to update several versions of the data
>> structure in parallel.
>
> I disagree with your statement. Considering that the kernel needs to
> keep ABI compatibility for whatever it exposes to user-space, claiming
> that it should never update several versions of data structures
> exposed to user-space in parallel means that once a data structure is
> exposed to user-space as ABI in a certain way, it can never ever
> change in the future, even if we find a better way to do things.

I think it's possible to put data into userspace without making it ABI.
Think about the init_module system call. The module blob comes from
userspace, but its (deeper) internal structure does not have a stable
ABI. Similar for many BPF use cases.

If the internal KTLS blob structure turns into ABI, including the parts
that need to be updated on context switch, each versioning change has a
performance impact.

>> This means that glibc would have to support multiple kernel data
>> structures, and users might lose userspace acceleration after a kernel
>> update, until they update glibc as well. The glibc update should be
>> ABI-compatible, but someone would still have to backport it, apply it to
>> container images, etc.
>
> No. If the kernel ever exposes a data structure to user-space as ABI,
> then it needs to stay there, and not break userspace. Hence the need to
> duplicate information provided to user-space if need be, so we can move
> on to better ABIs without breaking the old ones.

It can expose the data as an opaque blob.

> Or as Andy mentioned, we would simply pass the ktls offset as argument to
> the vDSO ? It seems simple enough. Would it fit all our use-cases including
> errno ?

That would work, yes. It's neat, but it won't give you a way to provide
traditional syscall wrappers directly from the vDSO.

>> We'll see what will break once we have the correct TID after vfork. 8->
>> glibc currently supports malloc-after-vfork as an extension, and
>> a lot of software depends on it (OpenJDK, for example).
>
> I am not sure to see how that is related to ktls ?

The mutex implementation could switch to the KTLS TID because it always
correct. But then locking in a vfork'ed subprocess would no longer look
like locking from the parent thread because the TID would be different.

Thanks,
Florian
--
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill