Re: [PATCH] kernel/watchdog: Prevent false hardlockup on overloaded system

From: Aaron Tomlin
Date: Thu Dec 15 2016 - 13:41:22 EST


On Tue 2016-12-06 11:17 -0500, Don Zickus wrote:
> On an overloaded system, it is possible that a change in the watchdog threshold
> can be delayed long enough to trigger a false positive.
>
> This can easily be achieved by having a cpu spinning indefinitely on a task,
> while another cpu updates watchdog threshold.
>
> What happens is while trying to park the watchdog threads, the hrtimers on the
> other cpus trigger and reprogram themselves with the new slower watchdog
> threshold. Meanwhile, the nmi watchdog is still programmed with the old faster
> threshold.
>
> Because the one cpu is blocked, it prevents the thread parking on the other
> cpus from completing, which is needed to shutdown the nmi watchdog and
> reprogram it correctly. As a result, a false positive from the nmi watchdog is
> reported.
>
> Fix this by setting a park_in_progress flag to block all lockups
> until the parking is complete.
>
> Fix provided by Ulrich Obergfell.
>
> Cc: Ulrich Obergfell <uobergfe@xxxxxxxxxx>
> Signed-off-by: Don Zickus <dzickus@xxxxxxxxxx>
> ---
> include/linux/nmi.h | 1 +
> kernel/watchdog.c | 9 +++++++++
> kernel/watchdog_hld.c | 3 +++
> 3 files changed, 13 insertions(+)

Looks fine to me.

Reviewed-by: Aaron Tomlin <atomlin@xxxxxxxxxx>

--
Aaron Tomlin