Re: [PATCH] lockdep: report broken irq restoration

From: Andy Lutomirski
Date: Wed Dec 09 2020 - 14:06:55 EST


On Wed, Dec 9, 2020 at 10:33 AM Mark Rutland <mark.rutland@xxxxxxx> wrote:
>
> We generally expect local_irq_save() and local_irq_restore() to be
> paired and sanely nested, and so local_irq_restore() expects to be
> called with irqs disabled. Thus, within local_irq_restore() we only
> trace irq flag changes when unmasking irqs.
>
> This means that a seuence such as:
>
> | local_irq_disable();
> | local_irq_save(flags);
> | local_irq_enable();
> | local_irq_restore(flags);
>
> ... is liable to break things, as the local_irq_restore() would mask
> IRQs without tracing this change.
>
> We don't consider such sequences to be a good idea, so let's define
> those as forbidden, and add tooling to detect such broken cases.
>
> This patch adds debug code to WARN() when local_irq_restore() is called
> with irqs enabled. As local_irq_restore() is expected to pair with
> local_irq_save(), it should never be called with interrupts enabled.
>
> To avoid the possibility of circular header dependencies beteen
> irqflags.h and bug.h, the warning is handled in a separate C file.
>
> The new code is all conditional on a new CONFIG_DEBUG_IRQFLAGS symbol
> which is independent of CONFIG_TRACE_IRQFLAGS. As noted above such cases
> will confuse lockdep, so CONFIG_DEBUG_LOCKDEP now selects
> CONFIG_DEBUG_IRQFLAGS.
>
> Signed-off-by: Mark Rutland <mark.rutland@xxxxxxx>
> Cc: Andy Lutomirski <luto@xxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Juergen Gross <jgross@xxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> ---
> include/linux/irqflags.h | 18 +++++++++++++++++-
> kernel/locking/Makefile | 1 +
> kernel/locking/irqflag-debug.c | 12 ++++++++++++
> lib/Kconfig.debug | 7 +++++++
> 4 files changed, 37 insertions(+), 1 deletion(-)
> create mode 100644 kernel/locking/irqflag-debug.c
>
> Note: as things stand this'll blow up at boot-time on x86 within the io-apic
> timer_irq_works() boot-time test. I've proposed a fix for that:
>
> https://lore.kernel.org/lkml/20201209181514.GA14235@C02TD0UTHF1T.local/
>
> ... which was sufficient for booting under QEMU without splats. I'm giving this
> a soak under Syzkaller on arm64 as that booted cleanly to begin with.
>
> Mark.
>
> diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
> index 3ed4e8771b64..bca3c6fa8270 100644
> --- a/include/linux/irqflags.h
> +++ b/include/linux/irqflags.h
> @@ -220,10 +220,26 @@ do { \
>
> #else /* !CONFIG_TRACE_IRQFLAGS */
>
> +#ifdef CONFIG_DEBUG_IRQFLAGS
> +extern void warn_bogus_irq_restore(bool *warned);
> +#define check_bogus_irq_restore() \
> + do { \
> + static bool __section(".data.once") __warned; \
> + if (unlikely(!raw_irqs_disabled())) \
> + warn_bogus_irq_restore(&__warned); \
> + } while (0)

What's the benefit of having a per-caller __warned instead of just
having a single global one in warn_bogus_irq_restore()?