Re: [PATCH linux-next][RFC] powerpc: avoid lockdep when we are offline

From: Nicholas Piggin
Date: Sun Oct 09 2022 - 23:49:34 EST


On Thu Sep 29, 2022 at 11:48 AM AEST, Zhouyi Zhou wrote:
> On Wed, Sep 28, 2022 at 10:51 AM Nicholas Piggin <npiggin@xxxxxxxxx> wrote:
> >
> > On Wed Sep 28, 2022 at 11:48 AM AEST, Zhouyi Zhou wrote:
> > > Thank Nick for reviewing my patch
> > >
> > > On Tue, Sep 27, 2022 at 12:25 PM Nicholas Piggin <npiggin@xxxxxxxxx> wrote:
> > > >
> > > > On Tue Sep 27, 2022 at 11:48 AM AEST, Zhouyi Zhou wrote:
> > > > > This is second version of my fix to PPC's "WARNING: suspicious RCU usage",
> > > > > I improved my fix under Paul E. McKenney's guidance:
> > > > > Link: https://lore.kernel.org/lkml/20220914021528.15946-1-zhouzhouyi@xxxxxxxxx/T/
> > > > >
> > > > > During the cpu offlining, the sub functions of xive_teardown_cpu will
> > > > > call __lock_acquire when CONFIG_LOCKDEP=y. The latter function will
> > > > > travel RCU protected list, so "WARNING: suspicious RCU usage" will be
> > > > > triggered.
> > > > >
> > > > > Avoid lockdep when we are offline.
> > > >
> > > > I don't see how this is safe. If RCU is no longer watching the CPU then
> > > > the memory it is accessing here could be concurrently freed. I think the
> > > > warning is valid.
> > > Agree
> > > >
> > > > powerpc's problem is that cpuhp_report_idle_dead() is called before
> > > > arch_cpu_idle_dead(), so it must not rely on any RCU protection there.
> > > > I would say xive cleanup just needs to be done earlier. I wonder why it
> > > > is not done in __cpu_disable or thereabouts, that's where the interrupt
> > > > controller is supposed to be stopped.
> > > Yes, I learn flowing events sequence from kgdb debugging
> > > __cpu_disable -> pseries_cpu_disable -> set_cpu_online(cpu, false) =
> > > leads to => do_idle: if (cpu_is_offline(cpu) -> arch_cpu_idle_dead
> > > so xive cleanup should be done in pseries_cpu_disable.
> >
> > It's a good catch and a reasonable approach to the problem.
> Thank Nick for your encouragement ;-)
> >
> > > But as a beginner, I afraid that I am incompetent to do above
> > > sophisticated work without error although I am very like to,
> > > Could any expert do this for us?
> >
> > This will be difficult for anybody, it's tricky code. I'm not an
> > expert at it.
> >
> > It looks like the interrupt controller disable split has been there
> > since long before xive. I would try just move them together than see
> > if that works.
> Yes, I use "git blame" (I learned "git blame" from Paul E. McKenny ;-)
> ) to see the same.
> and anticipate your great works!

I was thinking you could try it and see if it works and what you find.
If you are interested and have time to look into it?

Thanks,
Nick