Re: [PATCH] panic: add TAINT_SOFTLOCKUP

From: Josh Hunt
Date: Tue Jun 24 2014 - 10:22:29 EST


On 06/23/2014 05:51 PM, Andrew Morton wrote:
On Mon, 23 Jun 2014 17:45:00 -0500 Josh Hunt <johunt@xxxxxxxxxx> wrote:

On 06/23/2014 05:11 PM, Andrew Morton wrote:
On Tue, 3 Jun 2014 22:12:35 -0400 Josh Hunt <johunt@xxxxxxxxxx> wrote:

This taint flag will be set if the system has ever entered a softlockup
state. Similar to TAINT_WARN it is useful to know whether or not the system
has been in a softlockup state when debugging.

...

@@ -329,6 +329,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)

if (softlockup_panic)
panic("softlockup: hung tasks");
+ add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
__this_cpu_write(soft_watchdog_warn, true);
} else
__this_cpu_write(soft_watchdog_warn, false);

Would make more sense to have applied the taint *before* calling
panic()?

Andrew

Yep, that's a good call. Thanks. Do you want me to send a v2 or did you
take care of it?

I fixed it up.

In addition to adding the softlockup taint flag, do you think it'd be
reasonable to add another flag for page allocation failures? I think
it'd be nice to be able to account for these conditions somehow without
having to parse dmesg, etc. As with the softlockup flag, it's helpful to
know if your system had encountered a page allocation failure at some
point before the crash or whatever you're debugging.

I don't know, really. Allocation failures are often an expected thing
as drivers try to work out how much memory they can allocate. Those
things can be screened out by testing __GFP_NOWARN. GFP_ATOMIC
failures should probably be ignored, except for when they shouldn't.
But even then, allocation failures are somewhat common. And recency is
a concern: an allocation failure 10 minutes ago is unlikely to be
relevant.

But that's just me waving hands around. I'd be interested to hear from
people whose kernels crash more often than mine, and from those whose
job is to support them (ie distro people?).


Anyone you'd suggest adding to this thread to get other feedback about tracking page allocation failures? I could also spin up a patch and cc them.

Thanks
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/