Re: [PATCH v2 1/1] ppc/crash: Skip spinlocks during crash

From: Leonardo Bras
Date: Fri Mar 27 2020 - 14:16:07 EST


Hello Michael,

On Fri, 2020-03-27 at 14:50 +1100, Michael Ellerman wrote:
> Hi Leonardo,
>
> Leonardo Bras <leonardo@xxxxxxxxxxxxx> writes:
> > During a crash, there is chance that the cpus that handle the NMI IPI
> > are holding a spin_lock. If this spin_lock is needed by crashing_cpu it
> > will cause a deadlock. (rtas_lock and printk logbuf_log as of today)
>
> Please give us more detail on how those locks are causing you trouble, a
> stack trace would be good if you have it.

Sure, I have hit it in printf and rtas_call, as said before.

After crash_send_ipi(), it's tested how many cpus_in_crash are there,
and once they hit the total value, it's printed "IPI complete". This
printk call itself already got stuck in spin_lock, for example.

Here are the stack traces:

#0 arch_spin_lock
#1 do_raw_spin_lock
#2 __raw_spin_lock
#3 _raw_spin_lock
#4 vprintk_emit
#5 vprintk_func
#7 crash_kexec_prepare_cpus
#8 default_machine_crash_shutdown
#9 machine_crash_shutdown
#10 __crash_kexec
#11 crash_kexec
#12 oops_end

#0 arch_spin_lock
#1 lock_rtas ()
#2 rtas_call (token=8204, nargs=1, nret=1, outputs=0x0)
#3 ics_rtas_mask_real_irq (hw_irq=4100)
#4 machine_kexec_mask_interrupts
#5 default_machine_crash_shutdown
#6 machine_crash_shutdown
#7 __crash_kexec
#8 crash_kexec
#9 oops_end

> > This is a problem if the system has kdump set up, given if it crashes
> > for any reason kdump may not be saved for crash analysis.
> >
> > Skip spinlocks after NMI IPI is sent to all other cpus.
>
> We don't want to add overhead to all spinlocks for the life of the
> system, just to handle this one case.

I understand.
Other than this patch, I would propose doing something uglier, like
forcing the said locks to unlocked state when cpus_in_crash hits it's
maximum value, before printing "IPI complete".
Creating similar functions that don't lock, just for this case, looks
like overkill to me.

Do you have any other suggestion?

> There's already a flag that is set when the system is crashing,
> "oops_in_progress", maybe we need to use that somewhere to skip a lock
> or do an early return.

I think that would not work, because oops_in_progress should be 0 here:
oops_end() calls bust_spinlocks(0) before calling crash_kexec(), and
bust_spinlocks(0) will decrement oops_in_progress.
(just verified, it's 0 before printing "IPI complete").

Thank you the feedback, :)

Best regards,
Leonardo

Attachment: signature.asc
Description: This is a digitally signed message part