Re: [RFC PATCH 1/4] rseq: Add sched_state field to struct rseq

From: Noah Goldstein
Date: Tue May 23 2023 - 12:33:04 EST


On Tue, May 23, 2023 at 7:49 AM Mathieu Desnoyers
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>
> On 2023-05-19 16:51, Noah Goldstein wrote:
> > On Wed, May 17, 2023 at 10:28 AM Mathieu Desnoyers via Libc-alpha
> > <libc-alpha@xxxxxxxxxxxxxx> wrote:
> >>
> >> Expose the "on-cpu" state for each thread through struct rseq to allow
> >> adaptative mutexes to decide more accurately between busy-waiting and
> >> calling sys_futex() to release the CPU, based on the on-cpu state of the
> >> mutex owner.
> >>
> >> It is only provided as an optimization hint, because there is no
> >> guarantee that the page containing this field is in the page cache, and
> >> therefore the scheduler may very well fail to clear the on-cpu state on
> >> preemption. This is expected to be rare though, and is resolved as soon
> >> as the task returns to user-space.
> >>
> >> The goal is to improve use-cases where the duration of the critical
> >> sections for a given lock follows a multi-modal distribution, preventing
> >> statistical guesses from doing a good job at choosing between busy-wait
> >> and futex wait behavior.
> >>
> >> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
> >> Cc: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> >> Cc: Jonathan Corbet <corbet@xxxxxxx>
> >> Cc: Steven Rostedt (Google) <rostedt@xxxxxxxxxxx>
> >> Cc: Carlos O'Donell <carlos@xxxxxxxxxx>
> >> Cc: Florian Weimer <fweimer@xxxxxxxxxx>
> >> Cc: libc-alpha@xxxxxxxxxxxxxx
> >> ---
> >> include/linux/sched.h | 12 ++++++++++++
> >> include/uapi/linux/rseq.h | 17 +++++++++++++++++
> >> kernel/rseq.c | 14 ++++++++++++++
> >> 3 files changed, 43 insertions(+)
> >>
> >> diff --git a/include/linux/sched.h b/include/linux/sched.h
> >> index eed5d65b8d1f..c7e9248134c1 100644
> >> --- a/include/linux/sched.h
> >> +++ b/include/linux/sched.h
> >> @@ -2351,11 +2351,20 @@ static inline void rseq_signal_deliver(struct ksignal *ksig,
> >> rseq_handle_notify_resume(ksig, regs);
> >> }
> >>
> >> +void __rseq_set_sched_state(struct task_struct *t, unsigned int state);
> >> +
> >> +static inline void rseq_set_sched_state(struct task_struct *t, unsigned int state)
> >> +{
> >> + if (t->rseq)
> >> + __rseq_set_sched_state(t, state);
> >> +}
> >> +
> >> /* rseq_preempt() requires preemption to be disabled. */
> >> static inline void rseq_preempt(struct task_struct *t)
> >> {
> >> __set_bit(RSEQ_EVENT_PREEMPT_BIT, &t->rseq_event_mask);
> >> rseq_set_notify_resume(t);
> >> + rseq_set_sched_state(t, 0);
> >
> > Should rseq_migrate also be made to update the cpu_id of the new core?
> > I imagine the usage of this will be something along the lines of:
> >
> > if(!on_cpu(mutex->owner_rseq_struct) &&
> > cpu(mutex->owner_rseq_struct) == this_threads_cpu)
> > // goto futex
> >
> > So I would think updating on migrate would be useful as well.
>
> I don't think we want to act differently based on the cpu on which the
> owner is queued.
>
> If the mutex owner is not on-cpu, and queued on the same cpu as the
> current thread, we indeed want to call sys_futex WAIT.
>
> If the mutex owner is not on-cpu, but queued on a different cpu than the
> current thread, we *still* want to call sys_futex WAIT, because
> busy-waiting for a thread which is queued but not currently running is
> wasteful.
>
I think this is less clear. In some cases sure but not always. Going
to the futex
has more latency that userland waits, and if the system is not busy (other than
the one process) most likely less latency that yield. Also going to the futex
requires a syscall on unlock.

For example if the critical section is expected to be very small, it
would be easy
to imagine the lock be better implemented with:
while(is_locked)
if (owner->on_cpu || owner->cpu != my_cpu)
exponential backoff
else
yield

Its not that "just go to futex" doesn't ever make sense, but I don't
think its fair
to say that *always* the case.

Looking at the kernel code, it doesn't seem to be a particularly high cost to
keep the CPU field updated during migration so seems like a why not
kind of question.
> Or am I missing something ?
>
> Thanks,
>
> Mathieu
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
>