Re: [BUG] Re: Linux 6.4.4

From: Paul E. McKenney
Date: Thu Jul 20 2023 - 15:47:49 EST


On Thu, Jul 20, 2023 at 03:32:35PM -0400, Joel Fernandes wrote:
> On 7/20/23 15:04, Paul E. McKenney wrote:
> > On Thu, Jul 20, 2023 at 12:31:13PM -0400, Joel Fernandes wrote:
> >> Hi Paul,
> >>
> >> On Thu, Jul 20, 2023 at 11:55 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> >>>
> >>> On Thu, Jul 20, 2023 at 01:27:14PM +0000, Joel Fernandes wrote:
> [...]
> >>>>
> >>>> So likely RCU boosting is failing:
> >>>>
> >>>> The full TREE03 splat:
> >>>> [ 54.243588] ------------[ cut here ]------------
> >>>> [ 54.244547] rcu-torture: rcu_torture_boost started
> [...]
> >>>> [ 54.300499] RIP: 0010:rcu_torture_stats_print+0x5b2/0x620
> [...]
> >>>> [ 2169.481783] rcu_torture_writer: rtort_pipe_count: 9
> >>>>
> >>>> However, if we are to believe the '9', it appears the object did made it
> >>>> quite some till the end of the pipe array but not until the free pool.
> >>>
> >>> This is from this if/for statement, correct?
> >>>
> >>> stutter_waited = stutter_wait("rcu_torture_writer");
> >>> if (stutter_waited &&
> >>> !atomic_read(&rcu_fwd_cb_nodelay) &&
> >>> !cur_ops->slow_gps &&
> >>> !torture_must_stop() &&
> >>> boot_ended)
> >>> for (i = 0; i < ARRAY_SIZE(rcu_tortures); i++)
> >>> if (list_empty(&rcu_tortures[i].rtort_free) &&
> >>> rcu_access_pointer(rcu_torture_current) !=
> >>> &rcu_tortures[i]) {
> >>> tracing_off();
> >>> show_rcu_gp_kthreads();
> >>> WARN(1, "%s: rtort_pipe_count:
> >>> rcu_ftrace_dump(DUMP_ALL);
> >>> }
> >>
> >> Yes, that's right.
> >>
> >>> If so, this happens when there was a stutter wait, but RCU grace
> >>> periods failed to clear out the backlog during the several seconds that
> >>> rcutorture was forced idle. This might be related to the RCU priority
> >>> boosting failure, in which a preempted reader persisted across the
> >>> stutter interval.
> >>
> >> When RCU is operating normally, shouldn't the check
> >> "(list_empty(&rcu_tortures[i].rtort_free)" not run until the preempted
> >> reader unblocks and exits its RCU read-side critical section?
> >
> > Yes, but not just "until", but rather "long after". If RCU is doing
> > grace periods correctly, an active reader on a given rcu_tortures[]
> > element will prevent .rtort_pipe_count from exceeding the value 2.
>
> Ah ok, so the rtort_pipe_count being 9 is a sign RCU isn't making progress
> thus making it absent from the free list.

Yes, though RCU is -just- -barely- too slow, as one more grace period
would have done it.

> > The element will not be put on a list until .rtort_pipe_count is equal
> > to RCU_TORTURE_PIPE_LEN, which is 10.
> >
> > This warning usually appears when something is holding up the grace-period
> > kthread. Historically, this has included deadlocks, missed timers,
> > and whatever else can prevent the grace-period kthread from running.
>
> Makes sense.
>
> >> One thing that confuses me, in the case of
> >> "cur_ops->deferred_free(old_rp);" , the earlier do-while loop may exit
> >> before the async callbacks can finish. So what prevents the
> >> "(list_empty(&rcu_tortures[i].rtort_free)" check from happening before
> >> grace periods happen? Thanks for any clarification.
> >
> > We only enter this code if the stutter_wait() actually waited, and by
> > default this function will wait about five seconds. Since the rcutorture
> > testing goes idle during this time period (or is supposed to!), if things
> > are working properly, knocking off ten grace periods during that time
> > should be pretty much a given.
>
> Sure, makes sense. And this is not Lazy-RCU so 5 seconds should be plenty
> ;). I think I was subconsciously expecting an rcu_barrier() somewhere in the
> code before those checks, but that's not needed as you pointed that the
> stutter should be giving enough time for RCU to make progress.

And there might need to be a call_rcu_hurry() in there somewhere,
now that you mention it. Which would pretty much defeat any sort of
lazy-RCU-callback testing in rcutorture, but testing of laziness might
need to be separate anyway.

> So hmm, the count being 9 means that not enough RCU grace periods have
> passed for the rcu_torture object in question thus keeping it always
> allocated. The GP thread not getting CPU can do that indeed, or perhaps
> something else stalling RCU like a preempted reader, length preemption
> disabling on a CPU and so forth.. I'll try to collect a trace when it
> happens.

Looking forward to seeing what you come up with!

Thanx, Paul