Re: [PATCH v2 1/4] sched/core: Provide sched_rtmutex() and expose sched work helpers

From: Peter Zijlstra
Date: Thu May 11 2023 - 09:53:29 EST


On Tue, May 09, 2023 at 05:14:38PM -0500, Crystal Wood wrote:
> On Wed, 2023-05-03 at 15:20 +0200, Peter Zijlstra wrote:

> > Urgh, so I really don't like this.
> >
> > The end result is something like:
> >
> >         rt_mutex_lock()
> >           sched_submit_work();
> >             // a nested rt_mutex_lock() here will not clobber
> >             // ->pi_blocked_on because it's not set yet.
> >
> >           task_blocks_on_rt_mutex();
> >             tsk->pi_blocked_on = waiter;
> >             rt_mutex_enqueue(lock, waiter); <-- the real problem
> >
> >           rt_mutex_slowlock_block();
> >             schedule_rtmutex();
> >
> >           sched_resume_work();
> >
> > And all of this it not just because tsk->pi_blocked_on, but mostly
> > because of task_blocks_on_rt_mutex() enqueueing the waiter. The whole
> > enqueue thing is what makes the 'simple' solution of saving/restoring
> > tsk->pi_blocked_on not work.
> >
> > Basically the pi_blocked_on curruption is a side effect, not the
> > fundamental issue. One task having two waiters registered is the bigger
> > issue.
>
> Where do you see pi_blocked_on being saved/restored?

I'm not, I'm saying that *IF* ->pi_blocked_on corruption were the real
problem that would be a sufficient solution.

But ->pi_blocked_on corruption is just a wee side effect of the real
problem, which is that the waiter is already enqueued and we've already
done PI and can't very well back out of all that in a hurry.

> The whole point of
> this patchset is to deal with sched_submit_work() before anything has
> been done on the "outer" lock acquisition (not just pi_blocked_on, but
> also enqueuing) other than failing the fast path.

It's also terribly fragile, sprinkling stuff all over that shouldn't be
sprinkled.

And it's sprinkled far wider than it needs to be -- per the below
argument it really only should be applied to rtlock, not to rt_mutex or
ww_rt_mutex or any of the others that normally block and shouldn't be
used anyway.

> > Now, sched_submit_work() could also use (regular) mutex -- after all
> > it's a fully preemptible context. And then we're subject to the 'same'
> > problem but with tsk->blocked_on (DEBUG_MUTEXES=y).
>
> It's fully preemptible but it still shouldn't be doing things that would
> block on non-RT. That'd already be broken for a number of reasons (task
> state corruption, infinite recursion if current->plug isn't cleared
> before doing whatever causes another standard schedule(), etc).

task->state is fairly immune to corruption normally -- the typical
case is that the nested block resolves and resets it to RUNNING, at
which point the outer loop 'fails' to schedule() and re-tries. All that
is mostly harmless.

But yes, all that code *SHOULD* not block, but nothing is really
enforcing that.