Re: [BUG] lockdep splat using mmio tracer

From: Thomas Gleixner
Date: Fri Dec 02 2022 - 04:51:15 EST


On Thu, Dec 01 2022 at 21:31, Steven Rostedt wrote:
> I hit this while testing ftrace on an x86 32 bit VM (I've just started
> converting my tests to run on a VM, which is find new found bugs).

Which is find new found grammar twists for the english language :)

> [ 1111.130669] ================================
> [ 1111.130670] WARNING: inconsistent lock state
> [ 1111.130672] 6.1.0-rc6-test-00020-gbc591e45c100-dirty #245 Not tainted
> [ 1111.130674] --------------------------------
> [ 1111.130675] inconsistent {INITIAL USE} -> {IN-NMI} usage.
> [ 1111.130676] kworker/0:0/3433 [HC1[1]:SC0[0]:HE0:SE1] takes:
> [ 1111.130679] d3dc2b90 (kmmio_lock){....}-{2:2}, at: kmmio_die_notifier+0x70/0x140
> [ 1111.130690] {INITIAL USE} state was registered at:
> [ 1111.130691] lock_acquire+0xa2/0x2b0
> [ 1111.130696] _raw_spin_lock_irqsave+0x36/0x60
> [ 1111.130701] register_kmmio_probe+0x43/0x210
> [ 1111.130704] mmiotrace_ioremap+0x188/0x1b0
> [ 1111.130706] __ioremap_caller.constprop.0+0x257/0x340
> [ 1111.130711] ioremap_wc+0x12/0x20

That's regular task context, while the int3, which is raised by the
actual MMIO access, is considered to be NMI context. int3 has to be
considered an NMI type exception because int3 can be hit anywhere, even
in actual NMI context.

> [ 1111.130924] lock_acquire.cold+0x31/0x37
> [ 1111.130927] ? kmmio_die_notifier+0x70/0x140
> [ 1111.130935] ? get_ins_imm_val+0xf0/0xf0
> [ 1111.130938] _raw_spin_lock+0x2a/0x40
> [ 1111.130942] ? kmmio_die_notifier+0x70/0x140
> [ 1111.130945] kmmio_die_notifier+0x70/0x140
> [ 1111.130948] ? arm_kmmio_fault_page+0xa0/0xa0
> [ 1111.130951] atomic_notifier_call_chain+0x75/0x120
> [ 1111.130955] notify_die+0x44/0x90
> [ 1111.130959] exc_debug+0xd0/0x2a0
> [ 1111.130965] ? exc_int3+0x100/0x100
> [ 1111.130968] handle_exception+0x133/0x133
> [ 1111.130970] EIP: qxl_draw_dirty_fb+0x2ae/0x440 [qxl]

So for the mmio tracer there is no way that this happens:

> [ 1111.130788] lock(kmmio_lock);
> [ 1111.130789] <Interrupt>
> [ 1111.130790] lock(kmmio_lock);

but obviously lockdep cannot know that :)

The quick and dirty, but IMO safe way out of this, is to convert that
lock to an arch_spinlock and evade lockdep.

> I never hit this before, but then again, mmio tracer is showing output on
> the VM which it did not do on the baremetal machine.

It's exactly the same problem on bare metal.

Thanks,

tglx