Re: [-next] Lockdep warnings

From: Leo Yan
Date: Tue Jul 26 2022 - 22:04:05 EST


On Tue, Jul 26, 2022 at 01:50:06PM +0100, Mark Rutland wrote:
> On Tue, Jul 26, 2022 at 01:40:40PM +0100, Mark Rutland wrote:
> > [Adding Peter; I suspect this is due to the cpuidle rework]
>
> Looking again I see the cpuidle rework isn't in next, so evidently not...
>
> Sorry for the noise!

I'd like to loop in Mike.L and CoreSight ML for CTI PM callbacks.
Please see below a comment for CTI spinlock usage.

> > I'll go give next a spin in a VM, but I suspect I might need real HW to see
> > this due to the way PSCI idle states work.
> >
> > Mark.
> >
> > On Tue, Jul 26, 2022 at 11:41:34AM +0100, Sudeep Holla wrote:
> > > I was seeing the below lockdep warnings on my arm64 Juno development
> > > platform almost 2 weeks back with -next. I wanted to check for similar
> > > reports before post and forgot.
> > >
> > > --->8
> > >
> > > DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
> > > hardirqs last enabled at (46157): cpuidle_enter_state+0x174/0x2b4
> > > WARNING: CPU: 5 PID: 0 at kernel/locking/lockdep.c:5506 check_flags+0x90/0x1e8
> > > hardirqs last disabled at (46158): el1_interrupt+0x2c/0xc8
> > > Modules linked in:
> > > softirqs last enabled at (46154): __do_softirq+0x2c0/0x388
> > > softirqs last disabled at (46139): __irq_exit_rcu+0x118/0x18c
> > > CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.19.0-rc6-next-20220714 #9
> > > pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > pc : check_flags+0x90/0x1e8
> > > lr : check_flags+0x90/0x1e8
> > > Call trace:
> > > check_flags+0x90/0x1e8
> > > lock_is_held_type+0x80/0x164
> > > rcu_read_lock_sched_held+0x40/0x7c
> > > trace_rcu_dyntick+0x5c/0x140
> > > ct_kernel_enter+0x78/0xd4
> > > ct_idle_exit+0x1c/0x44
> > > cpu_idle_poll+0x74/0xb8
> > > do_idle+0x90/0x2c4
> > > cpu_startup_entry+0x30/0x34
> > > secondary_start_kernel+0x130/0x144
> > > __secondary_switched+0xb0/0xb4
> > > irq event stamp: 64229
> > > hardirqs last enabled at (64229): cpu_idle_poll+0x40/0xb8
> > > hardirqs last disabled at (64228): do_idle+0xbc/0x2c4
> > > softirqs last enabled at (64190): __do_softirq+0x2c0/0x388
> > > softirqs last disabled at (64185): __irq_exit_rcu+0x118/0x18c
> > > ---[ end trace 0000000000000000 ]---
> > > possible reason: unannotated irqs-off.
> > > irq event stamp: 64229
> > > hardirqs last enabled at (64229): cpu_idle_poll+0x40/0xb8
> > > hardirqs last disabled at (64228): do_idle+0xbc/0x2c4
> > > softirqs last enabled at (64190): __do_softirq+0x2c0/0x388
> > > softirqs last disabled at (64185): __irq_exit_rcu+0x118/0x18c
> > >
> > > ----
> > >
> > > However I don't see the above warning with the latest -next. When I tried
> > > yesterday's -next now, I see a different warning. Not sure if they are
> > > related. I haven't tried to bisect.
> > >
> > > --->8
> > > =============================
> > > [ BUG: Invalid wait context ]
> > > 5.19.0-rc8-next-20220725 #38 Not tainted
> > > -----------------------------
> > > swapper/0/0 is trying to lock:
> > > (&drvdata->spinlock){....}-{3:3}, at: cti_cpu_pm_notify+0x54/0x114
> > > other info that might help us debug this:
> > > context-{5:5}
> > > 1 lock held by swapper/0/0:
> > > #0: (cpu_pm_notifier.lock){....}-{2:2}, at: cpu_pm_enter+0x2c/0x80
> > > stack backtrace:
> > > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc8-next-20220725-00004-g599e6691ed8c #38
> > > Call trace:
> > > dump_backtrace+0xe8/0x108
> > > show_stack+0x18/0x4c
> > > dump_stack_lvl+0x90/0xc8
> > > dump_stack+0x18/0x54
> > > __lock_acquire+0xa70/0x32d0
> > > lock_acquire+0x160/0x308
> > > _raw_spin_lock+0x60/0xa0
> > > cti_cpu_pm_notify+0x54/0x114
> > > raw_notifier_call_chain_robust+0x50/0xd4
> > > cpu_pm_enter+0x48/0x80
> > > psci_enter_idle_state+0x34/0x74
> > > cpuidle_enter_state+0x120/0x2a8
> > > cpuidle_enter+0x38/0x50
> > > do_idle+0x1e8/0x2b8
> > > cpu_startup_entry+0x24/0x28
> > > kernel_init+0x0/0x1a0
> > > start_kernel+0x0/0x470
> > > start_kernel+0x34c/0x470
> > > __primary_switched+0xbc/0xc4

If we look into for this callback, we can see the lock sequence is:

cti_cpu_pm_notify()
`> cpu_pm_notify_robust():
`> raw_spin_lock_irqsave(cpu_pm_notifier.lock, flag) -> a raw spinlock
`> cti_cpu_pm_notify()
`> spin_lock(&drvdata->spinlock) -> a normal spinlock

A raw spinlock is not a sleepable lock, and normal spinlock can be a
sleepable lock (e.g. it can be a mutex after enabled PREEMPT_RT).

One solution is we can change to a raw spinlock in CTI driver, so this
can dismiss the lockdep warning.

Actually, I am a bit suspect if it's really necessary to use spinlock in
CTI PM callbacks, the reason is in CPU's idle flow, it will run into
idle thread context and disable the local IRQ, which means it likely has
no race condition with thread context and interrupt handler, so we can
remove the locking in PM callbacks.

Mike, could you check for this? Thanks!

Leo