Re: tools/testing/selftests/kvm/rseq_test and glibc 2.35

From: Gavin Shan
Date: Mon Aug 08 2022 - 21:58:30 EST


On 8/9/22 10:57 AM, Mathieu Desnoyers wrote:

----- Gavin Shan <gshan@xxxxxxxxxx> wrote:
Hi Florian,

On 8/9/22 2:01 AM, Florian Weimer wrote:
It has come to my attention that the KVM rseq test apparently needs to
be ported to glibc 2.35. The background is that on aarch64, rseq is the
only way to get a practically useful sched_getcpu. (There's no hidden
per-task CPU state the vDSO could reveal as the CPU ID.)


Yes, kvm/selftests/rseq needs to support glibc 2.35. The question is
about glibc 2.34 or 2.35 because kvm/selftest/rseq fails on glibc 2.34

I would guess upstream-glibc-2.35 feature is enabled on downstream
glibc-2.34?

# ./rseq_test
==== Test Assertion Failure ====
rseq_test.c:60: !r
pid=112043 tid=112043 errno=22 - Invalid argument
1 0x0000000000401973: main at rseq_test.c:226
2 0x0000ffff84b6c79b: ?? ??:0
3 0x0000ffff84b6c86b: ?? ??:0
4 0x0000000000401b6f: _start at ??:?
rseq failed, errno = 22 (Invalid argument)
# rpm -aq | grep glibc-2
glibc-2.34-39.el9.aarch64


The main rseq tests have already been adjusted via:

commit 233e667e1ae3e348686bd9dd0172e62a09d852e1
Author: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
Date: Mon Jan 24 12:12:45 2022 -0500

selftests/rseq: Uplift rseq selftests for compatibility with glibc-2.35
glibc-2.35 (upcoming release date 2022-02-01) exposes the rseq per-thread
data in the TCB, accessible at an offset from the thread pointer, rather
than through an actual Thread-Local Storage (TLS) variable, as the
Linux kernel selftests initially expected.
The __rseq_abi TLS and glibc-2.35's ABI for per-thread data cannot
actively coexist in a process, because the kernel supports only a single
rseq registration per thread.
Here is the scheme introduced to ensure selftests can work both with an
older glibc and with glibc-2.35+:
- librseq exposes its own "rseq_offset, rseq_size, rseq_flags" ABI.
- librseq queries for glibc rseq ABI (__rseq_offset, __rseq_size,
__rseq_flags) using dlsym() in a librseq library constructor. If those
are found, copy their values into rseq_offset, rseq_size, and
rseq_flags.
- Else, if those glibc symbols are not found, handle rseq registration
from librseq and use its own IE-model TLS to implement the rseq ABI
per-thread storage.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Link: https://lkml.kernel.org/r/20220124171253.22072-8-mathieu.desnoyers@xxxxxxxxxxxx

But I don't see a similar adjustment for
tools/testing/selftests/kvm/rseq_test.c. As an additional wrinkle,
you'd have to start calling getcpu (glibc function or system call)
because comparing rseq.cpu_id against sched_getcpu won't test anything
anymore once glibc implements sched_getcpu using rseq.

We noticed this because our downstream glibc version, while based on
2.34, enables rseq registration by default. To facilitate coordination
with rseq application usage, we also backported the __rseq_* ABI
symbols, so the selftests could use that even in our downstream version.
(We enable the glibc tunables downstream, but they are an optional
glibc feature, so it's probably better in the long run to fix the kernel
selftests rather than using the tunables as a workaround.)


Thanks for the pointer. It makes sense. So it means rseq registration has
been done by glibc TLS? In this case, kvm/selftests/rseq is unable to
register again.

The registration is done by glibc initialization and thread startup code.


I will come up something similiar for kvm/selftest/rseq.

Make sure to chech the rseq selftests fixes recently pulled in the current merge window as well. One is relevant:

https://github.com/torvalds/linux/commit/d1a997ba4c1bf65497d956aea90de42a6398f73a

We may want to find a way to remove this duplicated rseq.c code eventually.


Thanks, Mathieu. The check for 'rseq-size' will be included either. I almost
have something working. I will post the fixes after some tests.

Thanks,
Gavin