Re: [PATCH] sched/core: Schedule new worker even if PI-blocked

From: Sebastian Andrzej Siewior
Date: Tue Aug 20 2019 - 10:59:33 EST


On 2019-08-20 15:50:14 [+0200], Peter Zijlstra wrote:
> On Fri, Aug 16, 2019 at 06:06:26PM +0200, Sebastian Andrzej Siewior wrote:
> > If a task is PI-blocked (blocking on sleeping spinlock) then we don't want to
> > schedule a new kworker if we schedule out due to lock contention because !RT
> > does not do that as well.
>
> s/as well/either/
>
> > A spinning spinlock disables preemption and a worker
> > does not schedule out on lock contention (but spin).
>
> I'm not much liking this; it means that rt_mutex and mutex have
> different behaviour, and there are 'normal' rt_mutex users in the tree.

There isc RCU (boosting) and futex. I'm sceptical about the i2c usersâ

> > On RT the RW-semaphore implementation uses an rtmutex so
> > tsk_is_pi_blocked() will return true if a task blocks on it. In this case we
> > will now start a new worker
>
> I'm confused, by bailing out early it does _NOT_ start a new worker; or
> am I reading it wrong?

s@now@not@. Your eyes work good, soory for that.

> > which may deadlock if one worker is waiting on
> > progress from another worker.
>
> > Since a RW-semaphore starts a new worker on !RT, we should do the same on RT.
> >
> > XFS is able to trigger this deadlock.
> >
> > Allow to schedule new worker if the current worker is PI-blocked.
>
> Which contradicts earlier parts of this changelog.
>
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
> > ---
> > kernel/sched/core.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3945,7 +3945,7 @@ void __noreturn do_task_dead(void)
> >
> > static inline void sched_submit_work(struct task_struct *tsk)
> > {
> > - if (!tsk->state || tsk_is_pi_blocked(tsk))
> > + if (!tsk->state)
> > return;
> >
> > /*
> > @@ -3961,6 +3961,9 @@ static inline void sched_submit_work(str
> > preempt_enable_no_resched();
> > }
> >
> > + if (tsk_is_pi_blocked(tsk))
> > + return;
> > +
> > /*
> > * If we are going to sleep and we have plugged IO queued,
> > * make sure to submit it to avoid deadlocks.
>
> What do we need that clause for? Why is pi_blocked special _at_all_?

so !RT the scheduler does nothing special if a task blocks on sleeping
lock.
If I remember correctly then blk_schedule_flush_plug() is the problem.
It may require a lock which is held by the task.
It may hold A and wait for B while another task has B and waits for A.
If my memory does bot betray me then ext+jbd can lockup without this.

Sebastian