Re: [PATCH] panic: add TAINT_SOFTLOCKUP

From: Andrew Morton
Date: Mon Jun 23 2014 - 18:51:31 EST


On Mon, 23 Jun 2014 17:45:00 -0500 Josh Hunt <johunt@xxxxxxxxxx> wrote:

> On 06/23/2014 05:11 PM, Andrew Morton wrote:
> > On Tue, 3 Jun 2014 22:12:35 -0400 Josh Hunt <johunt@xxxxxxxxxx> wrote:
> >
> >> This taint flag will be set if the system has ever entered a softlockup
> >> state. Similar to TAINT_WARN it is useful to know whether or not the system
> >> has been in a softlockup state when debugging.
> >>
> >> ...
> >>
> >> @@ -329,6 +329,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> >>
> >> if (softlockup_panic)
> >> panic("softlockup: hung tasks");
> >> + add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
> >> __this_cpu_write(soft_watchdog_warn, true);
> >> } else
> >> __this_cpu_write(soft_watchdog_warn, false);
> >
> > Would make more sense to have applied the taint *before* calling
> > panic()?
>
> Andrew
>
> Yep, that's a good call. Thanks. Do you want me to send a v2 or did you
> take care of it?

I fixed it up.

> In addition to adding the softlockup taint flag, do you think it'd be
> reasonable to add another flag for page allocation failures? I think
> it'd be nice to be able to account for these conditions somehow without
> having to parse dmesg, etc. As with the softlockup flag, it's helpful to
> know if your system had encountered a page allocation failure at some
> point before the crash or whatever you're debugging.

I don't know, really. Allocation failures are often an expected thing
as drivers try to work out how much memory they can allocate. Those
things can be screened out by testing __GFP_NOWARN. GFP_ATOMIC
failures should probably be ignored, except for when they shouldn't.
But even then, allocation failures are somewhat common. And recency is
a concern: an allocation failure 10 minutes ago is unlikely to be
relevant.

But that's just me waving hands around. I'd be interested to hear from
people whose kernels crash more often than mine, and from those whose
job is to support them (ie distro people?).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/