Re: [RFC] softlockup: serialized softlockup's log

From: lizhe . 67
Date: Tue Nov 21 2023 - 22:53:18 EST


On Fri, 17 Nov 2023 13:45:21 <dianders@xxxxxxxxxxxx> wrote:
>>
>> From: Li Zhe <lizhe.67@xxxxxxxxxxxxx>
>>
>> If multiple CPUs trigger softlockup at the same time, the softlockup's
>> logs will appear staggeredly in dmesg, which will affect the viewing of
>> the logs for developer. Since the code path for outputting softlockup logs
>> is not a kernel hotspot and the performance requirements for the code
>> are not strict, locks are used to serialize the softlockup log output
>> to improve the readability of the logs.
>>
>> Signed-off-by: Li Zhe <lizhe.67@xxxxxxxxxxxxx>
>> ---
>> kernel/watchdog.c | 3 +++
>> 1 file changed, 3 insertions(+)
>
>This seems reasonable to me. It might be interesting to talk about in
>your commit message how this interacts with the various options. From
>code inspection, I believe:

Thanks for your advice. I will send a V2 patch to optimize my commit
message.

>* If `softlockup_all_cpu_backtrace` then this is a no-op since other
>CPUs will be prevented from running the printing code while one is
>already printing.

Yes your are right. If `softlockup_all_cpu_backtrace` is set, interleaving
problem is gone. And we don't need to worry about interleaving problem
in function trigger_allbutcpu_cpu_backtrace() because it has already
serialized the logs.

>* I'm not 100% sure what happens if `softlockup_panic` is set and I
>haven't sat down to test this myself. Will one CPUs panic message
>interleave the other CPUs traces. I guess in the end both CPUs will
>call panic()? Maybe you could experiment and describe the behavior in
>your commit message?

I did experiments and checked the implementation of the panic function.
I have not reproduced interleaving problem with this patch. The panic
function internally serializes the panic's logs by using variable
'panic_cpu'. Besides, function panic() will stop other cpu before outputing
logs, so I think the interleaving problem between softlockup logs from
cpu A and the panic logs from softlockup cpu B does not exist.

>> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>> index 5cd6d4e26915..8324ac194d0a 100644
>> --- a/kernel/watchdog.c
>> +++ b/kernel/watchdog.c
>> @@ -448,6 +448,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>> struct pt_regs *regs = get_irq_regs();
>> int duration;
>> int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
>> + static DEFINE_SPINLOCK(watchdog_timer_lock);
>
>I'd be tempted to define this outside the scope of this function. I
>need to dig more, but I'm pretty sure I've seen cases where a soft
>lockup could trigger while I was trying to print traces for a
>hardlockup, so it might be useful to grab the same spinlock in both
>places...

I've tried several times, but unfortunately I haven't been able to
reproduce the problem you mentioned. My concern is that if the lock
is shared, there will be potential deadlock issues because hardlockup
exploits nmi.