Re: scheduler problems in -next (was: Re: [PATCH 6.4 000/227] 6.4.7-rc1 review)

From: Roy Hopkins
Date: Mon Jul 31 2023 - 12:08:39 EST


On Mon, 2023-07-31 at 16:52 +0200, Peter Zijlstra wrote:
> On Mon, Jul 31, 2023 at 07:48:19AM -0700, Guenter Roeck wrote:
>
> > > I've taken your config above, and the rootfs.ext2 and run-sh from x86/.
> > > I've then modified run-sh to use:
> > >
> > >    qemu-system-x86_64 -enable-kvm -cpu host
> > >
> > > What I'm seeing is that some boots get stuck at:
> > >
> > > [    0.608230] Running RCU-tasks wait API self tests
> > >
> > > Is this the right 'problem' ?
> > >
> >
> >
> > Yes, exactly.
>
> Excellent! Let me prod that with something sharp, see what comes
> creeping out.

In an effort to get up to speed with this area of the kernel, I've been playing
around with this too today and managed to reproduce the problem using the same
configuration. I'm completely new to this code but I think I may have found the
root of the problem.

What I've found is that there is a race condition between starting the RCU tasks
grace-period thread in rcu_spawn_tasks_kthread_generic() and a subsequent call
to synchronize_rcu_tasks_generic(). This results in rtp->tasks_gp_mutex being
locked in the initial thread which subsequently blocks the newly started grace-
period thread.

The problem is that although synchronize_rcu_tasks_generic() checks to see if
the grace-period kthread is running, it uses rtp->kthread_ptr to achieve this.
This is only set in the thread entry point and not when the thread is created,
meaning that it is set only after the creating thread yields or is preempted. If
this has not happened before the next call to synchronize_rcu_tasks_generic()
then a deadlock occurs.

I've created a debug patch that introduces a new flag in rcu_tasks that is set
when the kthread is created and used this in synchronize_rcu_tasks_generic() in
place of READ_ONCE(rtp->kthread_ptr). This fixes the issue in my test
environment.

I'm happy to have a go at submitting a patch for this if it helps.