Re: [PATCH] watchdog: wdat_wdt: Set the min and max timeout values properly

From: Guenter Roeck
Date: Mon Aug 08 2022 - 07:36:54 EST


On 8/5/22 15:07, Jean Delvare wrote:
The wdat_wdt driver is misusing the min_hw_heartbeat_ms field. This
field should only be used when the hardware watchdog device should not
be pinged more frequently than a specific period. The ACPI WDAT
"Minimum Count" field, on the other hand, specifies the minimum
timeout value that can be set. This corresponds to the min_timeout
field in Linux's watchdog infrastructure.

Setting min_hw_heartbeat_ms instead can cause pings to the hardware
to be delayed when there is no reason for that, eventually leading to
unexpected firing of the watchdog timer (and thus unexpected reboot).

I'm also changing max_hw_heartbeat_ms to max_timeout for symmetry,
although the use of this one isn't fundamentally wrong, but there is
also no reason to enable the software-driven ping mechanism for the
wdat_wdt driver.


Normally I would reject this because it is not only unnecessary and
unrelated to the problem at hand (remember: one logical change per patch),
but it is hidden in an unrelated patch, it will only make life harder
later on if/when full milli-second timeouts are introduced, and it may
result in unexpected limitations on the maximum timeout. However, Mike
accepted it, so who am I to complain.

Signed-off-by: Jean Delvare <jdelvare@xxxxxxx>
Fixes: 058dfc767008 ("ACPI / watchdog: Add support for WDAT hardware watchdog")
Cc: Wim Van Sebroeck <wim@xxxxxxxxxxxxxxxxxx>
Cc: Guenter Roeck <linux@xxxxxxxxxxxx>
Cc! Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx>
Cc: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
---
Untested, as I have no supported hardware at hand.

Note to the watchdog subsystem maintainers: I must say I find the
whole thing pretty confusing.

First of all, the name symmetry between min_hw_heartbeat_ms and
max_hw_heartbeat_ms, while these properties are completely unrelated,
is heavily misleading. max_hw_heartbeat_ms is really max_hw_timeout
and should be renamed to that IMHO, if we keep it at all.


Variable names are hardly ever perfect. I resist renaming variables
to avoid rename wars. Feel free to submit patches to improve the
documentation if you like.

Secondly, the coexistence of max_timeout and max_hw_heartbeat_ms is
also making the code pretty hard to understand and get right.
Historically, max_timeout was already supposed to be the maximum
hardware timeout value. I don't understand why a new field with that
meaning was introduced, subsequently changing the original meaning of
max_timeout to become a software-only limit... but only if
max_hw_heartbeat_ms is set.


Code is hardly ever perfect. Feel free to submit patches to help
improve understanding if you like.

To be honest, I'm not sold to the idea of a software-emulated
maximum timeout value above what the hardware can do, but if doing
that makes sense in certain situations, then I believe it should be
implemented as a boolean flag (named emulate_large_timeout, for
example) to complement max_timeout instead of a separate time value.
Is there a reason I'm missing, why it was not done that way?

There are watchdogs with very low maximum timeout values, sometimes less than
3 seconds. gpio-wdt is one example - some have a maximum value of 2.5 seconds.
rzn1_wd is even more extreme with a maximum of 1 second. With such low values,
accuracy is important, second-based limits are insufficient, and there is an
actual need for software timeout handling on top of hardware.

At the same time, there is actually a need to make timeouts milli-second based
instead of second-based, for uses such as medical devices where timeouts need
to be short and accurate. The only reason for not implementing this is that
the proposals I have seen so far (including mine) were too messy for my liking,
and I never had the time to clean it up. Reverting milli-second support would
be the completely wrong direction.

Currently, a comment in watchdog.h claims that max_timeout is ignored
when max_hw_heartbeat_ms is set. However in watchdog_dev.c, sysfs
attribute max_timeout is created unconditionally, and
max_hw_heartbeat_ms doesn't have a sysfs attribute. So userspace has
no way to know if max_timeout is the hardware limit, or whether
software emulation will kick in for a specified timeout value. Also,
there is no complaint if both max_hw_heartbeat_ms and max_timeout
are set.

As mentioned before, code is hardly ever perfect. Patches to improve the
situation are welcome.

Thanks,
Guenter