Re: [PATCH v2 03/14] notifiers: Assert that RCU is watching in notify_die

From: Andy Lutomirski
Date: Mon Jun 22 2015 - 13:38:15 EST


On Mon, Jun 22, 2015 at 10:24 AM, Borislav Petkov <bp@xxxxxxxxx> wrote:
> On Mon, Jun 22, 2015 at 10:03:30AM -0700, Andy Lutomirski wrote:
>> The rcu_lockdep_assert should be merely a warning, not a full OOPS.
>
> It is still pretty huge, see below.
>
>> I think that, if rcu_lockdep_assert hangs, then we should fix that
>> rather than avoiding debugging checks.
>
> The RCU assertion firing might be unrelated to the oops happening and
> could prevent us from seeing the real splat.
>
> [ 0.048815]
> [ 0.050493] ===============================
> [ 0.052005] [ INFO: suspicious RCU usage. ]
> [ 0.056007] 4.1.0-rc8+ #4 Not tainted
> [ 0.060005] -------------------------------
> [ 0.064005] arch/x86/kernel/cpu/amd.c:677 BOINK!
> [ 0.066758]
> [ 0.066758] other info that might help us debug this:
> [ 0.066758]
> [ 0.068006]
> [ 0.068006] rcu_scheduler_active = 0, debug_locks = 0
> [ 0.072005] no locks held by swapper/0/0.
> [ 0.076005]
> [ 0.076005] stack backtrace:
> [ 0.080006] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.0-rc8+ #4
> [ 0.083331] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
> [ 0.084021] 0000000000000000 ffffffff81967eb8 ffffffff816709c7 0000000000000000
> [ 0.092005] ffffffff81975580 ffffffff81967ee8 ffffffff8109e8cd 0000000000000000
> [ 0.097227] ffffffff81a3aec0 ffffffff81cad9c0 ffffffff81cb42c0 ffffffff81967f38
> [ 0.104005] Call Trace:
> [ 0.106021] [<ffffffff816709c7>] dump_stack+0x4f/0x7b
> [ 0.108007] [<ffffffff8109e8cd>] lockdep_rcu_suspicious+0xfd/0x130
> [ 0.112007] [<ffffffff81017f74>] init_amd+0x34/0x560
> [ 0.116007] [<ffffffff810164e2>] identify_cpu+0x242/0x3b0
> [ 0.119068] [<ffffffff81c27172>] identify_boot_cpu+0x10/0x7e
> [ 0.120006] [<ffffffff81c27214>] check_bugs+0x9/0x2d
> [ 0.124007] [<ffffffff81c1fe8e>] start_kernel+0x40e/0x425
> [ 0.128007] [<ffffffff81c1f495>] x86_64_start_reservations+0x2a/0x2c
> [ 0.132009] [<ffffffff81c1f582>] x86_64_start_kernel+0xeb/0xef

But if we OOPS, we'll OOPS after the lockdep splat and the lockdep
splat will scroll off the screen, right? Am I missing something here?

notify_die is called before the actual OOPS code is invoked in traps.c.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at http://www.tux.org/lkml/