Re: [RFC PATCH 00/11] printk: safe printing in NMI context

From: Jiri Kosina
Date: Wed Jun 18 2014 - 10:53:57 EST


On Wed, 18 Jun 2014, Paul E. McKenney wrote:

> > > > - both RCU stall detector and 'echo l > sysrq-trigger' can (and we've
> > > > seen it happening for real) cause a complete, undebuggable, silent hang
> > > > of machine (deadlock in NMI context)
> > >
> > > I could easily add an option to RCU to allow people to tell it not to
> > > use NMIs to dump the stack. Would that help?
> >
> > Well, that would make unfortunately the information provided by RCU stall
> > detector rather useless ... workqueue-based stack dumping is very unlikely
> > to point its finger to the real offender, as it'd be coming way too late.
>
> I would not use workqueues, but rather have the CPU detecting the
> stall grovel through the other CPUs' stacks, which is what I do now for
> architectures that don't support NMI-based stack dumps. Would that be
> a reasonable approach?

That would indeed solve lockups induced by RCU stall detector (and we
should convert sysrq stack dumping code to use the same mechanism
afterwards).

But then, the kernel is still polluted by quite a few instances of

WARN_ON(in_nmi())

BUG_IN(in_nmi())

if (in_nmi())
printk(....)

which need to be fixed separately afterwards anyway.

Thanks,

--
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/