Re: [RFC PATCH v2 1/4] rseq: Add sched_state field to struct rseq

From: Steven Rostedt
Date: Thu Sep 28 2023 - 10:43:34 EST


On Thu, 28 Sep 2023 12:39:26 +0200
Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

> As always, are syscalls really *that* expensive? Why can't we busy wait
> in the kernel instead?

Yes syscalls are that expensive. Several years ago I had a good talk
with Robert Haas (one of the PostgreSQL maintainers) at Linux Plumbers,
and I asked him if they used futexes. His answer was "no". He told me
how they did several benchmarks and it was a huge performance hit (and
this was before Spectre/Meltdown made things much worse). He explained
to me that most locks are taken just to flip a few bits. Going into the
kernel and coming back was orders of magnitude longer than the critical
sections. By going into the kernel, it caused a ripple effect and lead
to even more contention. There answer was to implement their locking
completely in user space without any help from the kernel.

This is when I thought that having an adaptive spinner that could get
hints from the kernel via memory mapping would be extremely useful.

The obvious problem with their implementation is that if the owner is
sleeping, there's no point in spinning. Worse, the owner may even be
waiting for the spinner to get off the CPU before it can run again. But
according to Robert, the gain in the general performance greatly
outweighed the few times this happened in practice.

But still, if userspace could figure out if the owner is running on
another CPU or not, to act just like the adaptive mutexes in the
kernel, that would prevent the problem of a spinner keeping the owner
from running.

-- Steve