Re: [RFC PATCH] watchdog: Adding softwatchdog

From: Peter.Enderborg
Date: Sat Apr 24 2021 - 13:21:43 EST


On 4/24/21 7:07 PM, Guenter Roeck wrote:
> On 4/24/21 8:27 AM, Peter.Enderborg@xxxxxxxx wrote:
>> On 4/24/21 4:41 PM, Guenter Roeck wrote:
>>> On 4/24/21 3:25 AM, Peter Enderborg wrote:
>>>> This is not a rebooting watchdog. It's function is to take other
>>>> actions than a hard reboot. On many complex system there is some
>>>> kind of manager that monitor and take action on slow systems.
>>>> Android has it's lowmemorykiller (lmkd), desktops has earlyoom.
>>>> This watchdog can be used to help monitor to preform some basic
>>>> action to keep the monitor running.
>>>>
>>>> It can also be used standalone. This add a policy that is
>>>> killing the process with highest oom_score_adj and using
>>>> oom functions to it quickly. I think it is a good usecase
>>>> for the patch. Memory siuations can be problematic for
>>>> software that monitor system, but other prolicys can
>>>> should also be possible. Like picking tasks from a memcg, or
>>>> specific UID's or what ever is low priority.
>>>> ---
>>> NACK. Besides this not following the new watchdog API, the task
>>> of a watchdog is to reset the system on failure. Its task is most
>>> definitely not to re-implement the oom killer in any way, shape,
>>> or form.
>>>
>>> Guenter
>> Do you have better idea where the re-invented wheel might
>> fit better if it not for watchdog API?
>>
> The watchdog subsystem does support pretimeouts and a variety
> of configurable pretimeout notifiers. A pretimeout notifier which
> invokes the oom killer might be something worth discussing, though
> it would require an audience large enough to determine if it really
> makes sense (instead of improving the existing oom killer itself).
>
> A possible alternative might be to introduce watchdog pretimeout
> callbacks; this has actually be proposed in another context but
> without upstream user. The oom killer could then subscribe to
> watchdog pretimeouts and perform the action suggested here if
> a pretimeout is observed. Again, such an approach might be worth
> discussing with a larger audience.
>
> Thanks,
> Guenter

What should be a larger audience? I have include mm and
mm maintainer and the global list.