Re: Hit WARN_ON() in rcutorture.c:1055

From: Paul E. McKenney
Date: Mon Mar 23 2020 - 14:44:41 EST


On Mon, Mar 23, 2020 at 06:23:51PM +0000, Qais Yousef wrote:
> On 03/23/20 11:10, Paul E. McKenney wrote:
> > On Mon, Mar 23, 2020 at 05:41:48PM +0000, Qais Yousef wrote:
> > > On 03/23/20 10:17, Paul E. McKenney wrote:
> > > > On Mon, Mar 23, 2020 at 05:06:10PM +0000, Qais Yousef wrote:
> > > > > On 03/23/20 08:57, Paul E. McKenney wrote:
> > > > > > On Mon, Mar 23, 2020 at 03:43:09PM +0000, Qais Yousef wrote:
> > > > > > > Hi
> > > > > > >
> > > > > > > I hit the following warning while running rcutorture tests. It only happens
> > > > > > > when I try to hibernate the system (arm64 Juno-r2).
> > > > > >
> > > > > > Hibernating the system during rcutorture tests. Now that is gutsy! ;-)
> > > > >
> > > > > Hehe was just a side effect of testing the cpu hotplug stuff :-)
> > > > >
> > > > > >
> > > > > > > Let me know if you need additional info.
> > > > > >
> > > > > > 1. Do you need this to work? If so, please tell me your use case.
> > > > >
> > > > > Nope. It just happened while trying to stress the cpu hotplug series I just
> > > > > posted.
> > > > >
> > > > > > 2. What is line 1055 of your rcutorture.c? Here is my guess:
> > > > >
> > > > > It's 5.6-rc6, sorry should have mentioned in the report.
> > > > >
> > > > > /* Cycle through nesting levels of rcu_expedite_gp() calls. */
> > > > > if (can_expedite &&
> > > > > !(torture_random(&rand) & 0xff & (!!expediting - 1))) {
> > > > > WARN_ON_ONCE(expediting == 0 && rcu_gp_is_expedited());
> > > > > if (expediting >= 0)
> > > > > rcu_expedite_gp();
> > > > > else
> > > > > rcu_unexpedite_gp();
> > > > > if (++expediting > 3)
> > > > > expediting = -expediting;
> > > > > } else if (!can_expedite) { /* Disabled during boot, recheck. */
> > > > >
> > > > > If it's something you don't care about, then I don't care about too. I just
> > > > > thought I'd report it in case it uncovered something worthwhile.
> > > >
> > > > Well, my guess was wrong. ;-)
> > > >
> > > > This is instead rcutorture being surprised by the fact that RCU grace
> > > > periods are expedited during the hibernate process. I could fix this
> > > > particular situation, but I bet that there are a number of others,
> > > > including my guess above.
> > > >
> > > > One approach would be to halt rcutorture testing just before hibernating
> > > > and restart it just after resuming.
> > > >
> > > > Thoughts?
> > >
> > > {register, unregister}_pm_notifier() don't seem to be too hard to use.
> >
> > That part is easy. It would also be necessary to find all the affected
> > warnings in rcutorture and suppress them, not only during this time,
> > but also for some period of time afterwards. Maybe this is the only one,
> > but that would be surprising. ;-)
>
> Wouldn't be easier to just deinit/init()? ie: treat it like unload/load module.
>
> But you'll lose some info then that maybe you'd like to keep across
> suspend/resume cycles.

Hmmm... Are you running rcutorture as a loadable module or built into
your kernel? In the latter case, it starts up automatically shortly
after boot.

> > > But if it's not that simple, then it's not worthwhile I'd say. The report
> > > lives in LKML as a documentation of this missing support :-P
> >
> > It might at some point be necessary for rcutorture to handle suspends
> > and hibernates in midstream, and yes, it could be done, but first I need
> > to see some reason why it provides significant help.
>
> Sounds reasonable to me. I agree my use case isn't sensible in general. It just
> happened because I was testing an operation that affected both hibernation and
> rcutorture test and I tend to compile my kernels with everything built-in.

Don't get me wrong, I am very happy to see people making use of
rcutorture! ;-)

Thanx, Paul