Re: [RFC PATCH] watchdog: Adding softwatchdog

From: Christophe Leroy
Date: Sat Apr 24 2021 - 08:21:23 EST




Le 24/04/2021 à 12:25, Peter Enderborg a écrit :
This is not a rebooting watchdog. It's function is to take other
actions than a hard reboot. On many complex system there is some
kind of manager that monitor and take action on slow systems.
Android has it's lowmemorykiller (lmkd), desktops has earlyoom.
This watchdog can be used to help monitor to preform some basic
action to keep the monitor running.

It can also be used standalone. This add a policy that is
killing the process with highest oom_score_adj and using
oom functions to it quickly. I think it is a good usecase
for the patch. Memory siuations can be problematic for
software that monitor system, but other prolicys can
should also be possible. Like picking tasks from a memcg, or
specific UID's or what ever is low priority.


I'm nore sure I understand the reasoning behind the choice of oom logic to decide which task to kill.

Usually a watchdog will detect if a task is using 100% of the CPU time. If such a task exists, it is the one running, not another one that has huge amount of memory allocated by spends like 1% of CPU time.

So if there is a task to kill by a watchdog, I would say it is the current task.


Another remark: you are using regular timers as far as I understand. I remember having problems with that in the past, it required the use of hrtimers. I can't remember the details exactly but you can look at commit https://github.com/linuxppc/linux/commit/1ff688209

Christophe