Re: [PATCH] kernel/hung_task.c: use timeout diff when timeout is updated

From: Andrew Morton
Date: Mon Dec 21 2015 - 16:45:51 EST


On Mon, 21 Dec 2015 20:45:23 +0900 Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > And it would be helpful to add a comment to hung_timeout_jiffies()
> > which describes the behaviour and explains the reasons for it.
>
> But before doing it, I'd like to confirm hung task maintainer's will.
>
> The reason I proposed this patch is that I want to add a watchdog task
> which emits warning messages when memory allocations are stalling.
> http://lkml.kernel.org/r/201512130033.ABH90650.FtFOMOFLVOJHQS@xxxxxxxxxxxxxxxxxxx
>
> But concurrently emitting multiple backtraces is problematic. Concurrent
> emitting by hung task watchdog and memory allocation stall watchdog is very
> likely to occur, for it is likely that other task is also stuck in
> uninterruptible sleep when one task got stuck at memory allocation.
>
> Therefore, I started trying to use same thread for both watchdogs.
> A draft patch is at
> http://lkml.kernel.org/r/201512170011.IAC73451.FLtFMSJHOQFVOO@xxxxxxxxxxxxxxxxxxx .
>
> If you prefer current hang task behavior, I'll try to preseve current
> behavior. Instead, I might use two threads and try to mutex both watchdogs
> using console_lock() or something like that.
>
> So, may I ask what your preference is?

I've added linux-mm to Cc. Please never forget that.

The general topic here is "add more diagnostics around an out-of-memory
event". Clearly we need this, but Michal is working on the same thing
as part of his "OOM detection rework v4" work, so can we please do the
appropriate coordination and review there?

Preventing watchdog-triggered backtraces from messing each other up is
of course a good idea. Your malloc watchdog patch adds a surprising
amount of code and adding yet another kernel thread is painful but
perhaps it's all worth it. It's a matter of people reviewing, testing
and using the code in realistic situations and that process has hardly
begun, alas.

Sorry, that was waffly but I don't feel able to be more definite at
this time.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/