Re: [RFC] softlockup: serialized softlockup's log

From: Doug Anderson
Date: Fri Nov 17 2023 - 16:45:56 EST


Hi,

On Mon, Nov 13, 2023 at 6:16 PM <lizhe.67@xxxxxxxxxxxxx> wrote:
>
> From: Li Zhe <lizhe.67@xxxxxxxxxxxxx>
>
> If multiple CPUs trigger softlockup at the same time, the softlockup's
> logs will appear staggeredly in dmesg, which will affect the viewing of
> the logs for developer. Since the code path for outputting softlockup logs
> is not a kernel hotspot and the performance requirements for the code
> are not strict, locks are used to serialize the softlockup log output
> to improve the readability of the logs.
>
> Signed-off-by: Li Zhe <lizhe.67@xxxxxxxxxxxxx>
> ---
> kernel/watchdog.c | 3 +++
> 1 file changed, 3 insertions(+)

This seems reasonable to me. It might be interesting to talk about in
your commit message how this interacts with the various options. From
code inspection, I believe:

* If `softlockup_all_cpu_backtrace` then this is a no-op since other
CPUs will be prevented from running the printing code while one is
already printing.

* I'm not 100% sure what happens if `softlockup_panic` is set and I
haven't sat down to test this myself. Will one CPUs panic message
interleave the other CPUs traces. I guess in the end both CPUs will
call panic()? Maybe you could experiment and describe the behavior in
your commit message?


> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 5cd6d4e26915..8324ac194d0a 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -448,6 +448,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> struct pt_regs *regs = get_irq_regs();
> int duration;
> int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
> + static DEFINE_SPINLOCK(watchdog_timer_lock);

I'd be tempted to define this outside the scope of this function. I
need to dig more, but I'm pretty sure I've seen cases where a soft
lockup could trigger while I was trying to print traces for a
hardlockup, so it might be useful to grab the same spinlock in both
places...