[PATCH] power: suspend: Add suspend timeout handler

From: josephjang
Date: Tue Oct 20 2020 - 04:15:47 EST


On Tue, Oct 20, 2020 at 02:22:26PM +0800, Joseph Jang wrote:
> Add sleep timer and timeout handler to prevent device stuck during suspend/
> resume process. The timeout handler will dump disk sleep task at first
> round timeout and trigger kernel panic at second round timeout.
> The default timer for each round is defined in
> CONFIG_PM_SLEEP_TIMER_TIMEOUT.
>
> Signed-off-by: Joseph Jang <josephjang@xxxxxxxxxx>
> ---
> MAINTAINERS | 2 +
> include/linux/console.h | 1 +
> include/linux/suspend_timer.h | 90 +++++++++++++++++++++++++++++++++++

Why is this file in include/linux/ if you only ever call it from one .c
file?

I just refer to include/linux/suspend.h and create a new header file in the same folder.
If you have a better location for the new header file, please feel free to let me know.

> --- /dev/null
> +++ b/include/linux/suspend_timer.h
> @@ -0,0 +1,90 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_SLEEP_TIMER_H
> +#define _LINUX_SLEEP_TIMER_H
> +
> +#include <linux/sched/debug.h>
> +
> +#ifdef CONFIG_PM_SLEEP_MONITOR
> +struct sleep_timer {
> + struct task_struct *tsk;
> + struct timer_list timer;
> +};
> +
> +#define DECLARE_SLEEP_TIMER(st) > > + struct sleep_timer st
> +
> +/**
> + * init_sleep_timer - Initialize sleep timer.
> + * @st: Sleep timer to initialize.
> + * @func: Sleep timer timeout handler.
> + */
> +static void init_sleep_timer(struct sleep_timer *st, void (*func))
> +{
> + struct timer_list *timer = &st->timer;
> +
> + timer_setup(timer, func, 0);
> +}
> +
> +/**
> + * start_sleep_timer - Enable sleep timer to monitor suspend thread.
> + * @st: Sleep timer to enable.
> + */
> +static void start_sleep_timer(struct sleep_timer *st)
> +{
> + struct timer_list *timer = &st->timer;
> +
> + st->tsk = current;
> +
> + /* use same timeout value for both suspend and resume */
> + timer->expires = jiffies + HZ * CONFIG_PM_SLEEP_TIMER_TIMEOUT;
> + add_timer(timer);
> +}
> +
> +/**
> + * stop_sleep_timer - Disable sleep timer.
> + * @st: sleep timer to disable.
> + */
> +static void stop_sleep_timer(struct sleep_timer *st)
> +{
> + struct timer_list *timer = &st->timer;
> +
> + del_timer_sync(timer);
> +}
> +
> +/**
> + * sleep_timeout_handler - sleep timer timeout handler.
> + * @t: The timer list that sleep timer depends on.
> + *
> + * Called when suspend thread has timeout suspending or resuming.
> + * Dump all uninterruptible tasks' call stack and call panic() to
> + * reboot system in second round timeout.
> + */
> +static void sleep_timeout_handler(struct timer_list *t)
> +{
> + struct sleep_timer *st = from_timer(st, t, timer);
> + static int timeout_count;
> +
> + pr_info("Sleep timeout (timer is %d seconds)\n",
> + (CONFIG_PM_SLEEP_TIMER_TIMEOUT));
> + show_stack(st->tsk, NULL, KERN_EMERG);
> + show_state_filter(TASK_UNINTERRUPTIBLE);
> +
> + if (timeout_count < 1) {
> + timeout_count++;
> + start_sleep_timer(st);
> + return;
> + }
> +
> + if (console_is_suspended())
> + resume_console();
> +
> + panic("Sleep timeout and panic\n");
> +}
> +#else
> +#define DECLARE_SLEEP_TIMER(st)
> +#define init_sleep_timer(x, y)
> +#define start_sleep_timer(x)
> +#define stop_sleep_timer(x)
> +#endif
> +
> +#endif /* _LINUX_SLEEP_TIMER_H */
> diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
> index a7320f07689d..9e2b274db0c1 100644
> --- a/kernel/power/Kconfig
> +++ b/kernel/power/Kconfig
> @@ -207,6 +207,21 @@ config PM_SLEEP_DEBUG
> def_bool y
> depends on PM_DEBUG && PM_SLEEP
>
> +config PM_SLEEP_MONITOR
> + bool "Linux kernel suspend/resume process monitor"
> + depends on PM_SLEEP
> + help
> + This option will enable sleep timer to prevent device stuck
> + during suspend/resume process. Sleep timeout handler will dump
> + disk sleep task at first round timeout and trigger kernel panic
> + at second round timeout. The timer for each round is defined in
> + CONFIG_PM_SLEEP_TIMER_TIMEOUT.

I thought we already had a watchdog for all of this, why not just always
add this to that code, for that config option?


Yes, we already have DPM_WATCHDOG to monitor device power management.
But we really hit the suspend hang issue that DPM_WATCHDOG cannot cover.
We propose a wide coverage debug feature like PM_SLEEP_MONITOR which
not only covers PM but also core PM hang issues.

And DPM_WATCHDOG is for device driver power management in drivers/base/power/main.c
and PM_SLEEP_MONITOR locate is for core power management in kernel/power/suspend.c.
I think it is fine for users to select whether they need device PM only or not.


And why isn't the watchdog sufficient for you? Why are you "open
coding" a watchdog timer logic here at all???


Yes, we refer to DPM_WATCHDOG to extend the watchdog debugging for core PM.
Because we really hit a real case that was not covered by DPM_WATCHDOG.
I think PM_SLEEP_MONITOR is an extension debug feature from DPM_WATCHDOG.


thanks,

greg k-h


Thank you,
Joseph.