Re: [RFC PATCH v2 1/4] rseq: Add sched_state field to struct rseq

From: Mathieu Desnoyers
Date: Thu Sep 28 2023 - 16:44:40 EST


On 9/28/23 16:21, Thomas Gleixner wrote:
On Mon, May 29 2023 at 15:14, Mathieu Desnoyers wrote:
+void __rseq_set_sched_state(struct task_struct *t, unsigned int state);
+
+static inline void rseq_set_sched_state(struct task_struct *t, unsigned int state)
+{
+ if (t->rseq_sched_state)
+ __rseq_set_sched_state(t, state);

This is invoked on every context switch and writes over that state
unconditionally even in the case that the state was already
cleared. There are enough situations where tasks are scheduled out
several times while being in the kernel.

Right, if this becomes more than a PoC, I'll make sure to keep track of the current state within the task struct, and only update userspace on state transition.


/* rseq_preempt() requires preemption to be disabled. */
static inline void rseq_preempt(struct task_struct *t)
{
__set_bit(RSEQ_EVENT_PREEMPT_BIT, &t->rseq_event_mask);
rseq_set_notify_resume(t);
+ rseq_set_sched_state(t, 0);

This code is already stupid to begin with. __set_bit() is cheap, but
rseq_set_notify_resume() is not as it has a conditional and a locked
instruction

What alternative would you recommend to set a per-thread state that has the same effect as TIF_NOTIFY_RESUME ? Indeed all it really needs to synchronize with is the thread owning the flags, but AFAIU having this flag part of the TIF flags requires use of an atomic instruction to synchronize updates against concurrent threads.

If we move this thread flag into a separate field of struct thread_info, then we could turn this atomic set bit into a more lightweight store, but then we'd need to check against an extra field on return to userspace.

And if we want to remove the conditional branch on the scheduler fast-path, we could always load and test both the task struct's rseq pointer and the thread_info "preempted" state on return to userspace.

The tradeoff there would be to add extra loads and conditional branches on return to userspace to speed up the scheduler fast-path.

Is this what you have in mind or am I missing your point ?

and now you add two more conditionals into the context
switch path.

I'm open to suggestions on how to improve this if this goes beyond PoC stage and we observe measurable benefits on the userspace side.

Thanks,

Mathieu


Thanks,

tglx

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com