Re: [PATCH] watchdog: core: make sure the watchdog_worker always works

From: Christophe LEROY
Date: Fri Dec 08 2017 - 05:33:24 EST




Le 07/12/2017 Ã 15:45, Guenter Roeck a ÃcritÂ:
On 12/07/2017 02:38 AM, Christophe Leroy wrote:
When running a command like 'chrt -f 99 dd if=/dev/zero of=/dev/null',
the watchdog_worker fails to service the HW watchdog and the
HW watchdog fires long before the watchdog soft timeout.

At the moment, the watchdog_worker is invoked as a delayed work.
Delayed works are handled by non realtime kernel threads. The
WQ_HIGHPRI flag only increases the niceness of that threads.

This patchs replaces the delayed work logic by hrtimer, in order to

s/patchs/patch/

ensure that the watchdog_worker will already have priority.

always ?


Signed-off-by: Christophe Leroy <christophe.leroy@xxxxxx>
---
 drivers/watchdog/watchdog_dev.c | 87 +++++++++++++++++++----------------------
 1 file changed, 41 insertions(+), 46 deletions(-)


[snip]


I am not sure if it is a good idea to execute the ping in this context.
After all, this code may block (eg if the ping is sent to an i2c device,
but also due to the use of a mutex). Looking into other drivers using
high resolution timers, it is common to have the actual work done by
a worker.
Of course, that might defeat the purpose of this exercise, so the
question is if it is possible to have a realtime worker. Can you explore ?

kthread_delayed_work is likely our solution:
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b56c0d8937e665a27d90517ee7a746d0aa05af46
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=22597dc3d97b1ead2aca201397415a1a84bf2b26

I submitted v2 of the patch based on that instead of hrtimer. Also implies less modifications to the code.

Christophe


Thanks,
Guenter

 }
 /*
@@ -230,7 +236,7 @@ static void watchdog_ping_work(struct work_struct *work)
 static int watchdog_start(struct watchdog_device *wdd)
 {
ÂÂÂÂÂ struct watchdog_core_data *wd_data = wdd->wd_data;
-ÂÂÂ unsigned long started_at;
+ÂÂÂ ktime_t started_at;
ÂÂÂÂÂ int err;
ÂÂÂÂÂ if (watchdog_active(wdd))
@@ -238,7 +244,7 @@ static int watchdog_start(struct watchdog_device *wdd)
ÂÂÂÂÂ set_bit(_WDOG_KEEPALIVE, &wd_data->status);
-ÂÂÂ started_at = jiffies;
+ÂÂÂ started_at = ktime_get();
ÂÂÂÂÂ if (watchdog_hw_running(wdd) && wdd->ops->ping)
ÂÂÂÂÂÂÂÂÂ err = wdd->ops->ping(wdd);
ÂÂÂÂÂ else
@@ -919,10 +925,8 @@ static int watchdog_cdev_register(struct watchdog_device *wdd, dev_t devno)
ÂÂÂÂÂ wd_data->wdd = wdd;
ÂÂÂÂÂ wdd->wd_data = wd_data;
-ÂÂÂ if (!watchdog_wq)
-ÂÂÂÂÂÂÂ return -ENODEV;
-
-ÂÂÂ INIT_DELAYED_WORK(&wd_data->work, watchdog_ping_work);
+ÂÂÂ hrtimer_init(&wd_data->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ÂÂÂ wd_data->timer.function = watchdog_ping_work;
ÂÂÂÂÂ if (wdd->id == 0) {
ÂÂÂÂÂÂÂÂÂ old_wd_data = wd_data;
@@ -958,7 +962,7 @@ static int watchdog_cdev_register(struct watchdog_device *wdd, dev_t devno)
ÂÂÂÂÂ }
ÂÂÂÂÂ /* Record time of most recent heartbeat as 'just before now'. */
-ÂÂÂ wd_data->last_hw_keepalive = jiffies - 1;
+ÂÂÂ wd_data->last_hw_keepalive = ktime_get();

Please assign ktime_sub(ktime_get(), 1);

Thanks,
Guenter

ÂÂÂÂÂ /*
ÂÂÂÂÂÂ * If the watchdog is running, prevent its driver from being unloaded,
@@ -968,7 +972,7 @@ static int watchdog_cdev_register(struct watchdog_device *wdd, dev_t devno)
ÂÂÂÂÂÂÂÂÂ if (handle_boot_enabled) {
ÂÂÂÂÂÂÂÂÂÂÂÂÂ __module_get(wdd->ops->owner);
ÂÂÂÂÂÂÂÂÂÂÂÂÂ kref_get(&wd_data->kref);
-ÂÂÂÂÂÂÂÂÂÂÂ queue_delayed_work(watchdog_wq, &wd_data->work, 0);
+ÂÂÂÂÂÂÂÂÂÂÂ hrtimer_start(&wd_data->timer, 0, HRTIMER_MODE_REL);
ÂÂÂÂÂÂÂÂÂ } else {
ÂÂÂÂÂÂÂÂÂÂÂÂÂ pr_info("watchdog%d running and kernel based pre-userspace handler disabled\n",
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ wdd->id);
@@ -1006,7 +1010,7 @@ static void watchdog_cdev_unregister(struct watchdog_device *wdd)
ÂÂÂÂÂÂÂÂÂ watchdog_stop(wdd);
ÂÂÂÂÂ }
-ÂÂÂ cancel_delayed_work_sync(&wd_data->work);
+ÂÂÂ hrtimer_cancel(&wd_data->timer);
ÂÂÂÂÂ kref_put(&wd_data->kref, watchdog_core_data_release);
 }
@@ -1111,13 +1115,6 @@ int __init watchdog_dev_init(void)
 {
ÂÂÂÂÂ int err;
-ÂÂÂ watchdog_wq = alloc_workqueue("watchdogd",
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ WQ_HIGHPRI | WQ_MEM_RECLAIM, 0);
-ÂÂÂ if (!watchdog_wq) {
-ÂÂÂÂÂÂÂ pr_err("Failed to create watchdog workqueue\n");
-ÂÂÂÂÂÂÂ return -ENOMEM;
-ÂÂÂ }
-
ÂÂÂÂÂ err = class_register(&watchdog_class);
ÂÂÂÂÂ if (err < 0) {
ÂÂÂÂÂÂÂÂÂ pr_err("couldn't register class\n");
@@ -1135,7 +1132,6 @@ int __init watchdog_dev_init(void)
 err_alloc:
ÂÂÂÂÂ class_unregister(&watchdog_class);
 err_register:
-ÂÂÂ destroy_workqueue(watchdog_wq);
ÂÂÂÂÂ return err;
 }
@@ -1149,7 +1145,6 @@ void __exit watchdog_dev_exit(void)
 {
ÂÂÂÂÂ unregister_chrdev_region(watchdog_devt, MAX_DOGS);
ÂÂÂÂÂ class_unregister(&watchdog_class);
-ÂÂÂ destroy_workqueue(watchdog_wq);
 }
 module_param(handle_boot_enabled, bool, 0444);