Re: [BUG] lockdep splat using mmio tracer

From: Steven Rostedt
Date: Fri Dec 02 2022 - 07:27:21 EST


On Fri, 02 Dec 2022 10:51:02 +0100
Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:

> On Thu, Dec 01 2022 at 21:31, Steven Rostedt wrote:
> > I hit this while testing ftrace on an x86 32 bit VM (I've just started
> > converting my tests to run on a VM, which is find new found bugs).
>
> Which is find new found grammar twists for the english language :)

That's what I get for writing bug reports past my bedtime. ;-)

>
> > [ 1111.130669] ================================
> > [ 1111.130670] WARNING: inconsistent lock state
> > [ 1111.130672] 6.1.0-rc6-test-00020-gbc591e45c100-dirty #245 Not tainted
> > [ 1111.130674] --------------------------------
> > [ 1111.130675] inconsistent {INITIAL USE} -> {IN-NMI} usage.
> > [ 1111.130676] kworker/0:0/3433 [HC1[1]:SC0[0]:HE0:SE1] takes:
> > [ 1111.130679] d3dc2b90 (kmmio_lock){....}-{2:2}, at: kmmio_die_notifier+0x70/0x140
> > [ 1111.130690] {INITIAL USE} state was registered at:
> > [ 1111.130691] lock_acquire+0xa2/0x2b0
> > [ 1111.130696] _raw_spin_lock_irqsave+0x36/0x60
> > [ 1111.130701] register_kmmio_probe+0x43/0x210
> > [ 1111.130704] mmiotrace_ioremap+0x188/0x1b0
> > [ 1111.130706] __ioremap_caller.constprop.0+0x257/0x340
> > [ 1111.130711] ioremap_wc+0x12/0x20
>
> That's regular task context, while the int3, which is raised by the
> actual MMIO access, is considered to be NMI context. int3 has to be
> considered an NMI type exception because int3 can be hit anywhere, even
> in actual NMI context.

Yep, that's what I figured.

>
> > [ 1111.130924] lock_acquire.cold+0x31/0x37
> > [ 1111.130927] ? kmmio_die_notifier+0x70/0x140
> > [ 1111.130935] ? get_ins_imm_val+0xf0/0xf0
> > [ 1111.130938] _raw_spin_lock+0x2a/0x40
> > [ 1111.130942] ? kmmio_die_notifier+0x70/0x140
> > [ 1111.130945] kmmio_die_notifier+0x70/0x140
> > [ 1111.130948] ? arm_kmmio_fault_page+0xa0/0xa0
> > [ 1111.130951] atomic_notifier_call_chain+0x75/0x120
> > [ 1111.130955] notify_die+0x44/0x90
> > [ 1111.130959] exc_debug+0xd0/0x2a0
> > [ 1111.130965] ? exc_int3+0x100/0x100
> > [ 1111.130968] handle_exception+0x133/0x133
> > [ 1111.130970] EIP: qxl_draw_dirty_fb+0x2ae/0x440 [qxl]
>
> So for the mmio tracer there is no way that this happens:
>
> > [ 1111.130788] lock(kmmio_lock);
> > [ 1111.130789] <Interrupt>
> > [ 1111.130790] lock(kmmio_lock);
>
> but obviously lockdep cannot know that :)
>
> The quick and dirty, but IMO safe way out of this, is to convert that
> lock to an arch_spinlock and evade lockdep.

Thanks, I'll write up a patch for this.

>
> > I never hit this before, but then again, mmio tracer is showing output on
> > the VM which it did not do on the baremetal machine.
>
> It's exactly the same problem on bare metal.

Yep, but for some reason it never triggered on baremetal for me.

-- Steve