Re: [PATCH v3] arm64: sdei: abort running SDEI handlers during crash

From: James Morse
Date: Fri Jun 23 2023 - 10:48:28 EST


Hi Scott,

On 07/06/2023 20:55, D Scott Phillips wrote:
> Interrupts are blocked in SDEI context, per the SDEI spec: "The client
> interrupts cannot preempt the event handler." If we crashed in the SDEI
> handler-running context (as with ACPI's AGDI) then we need to clean up the
> SDEI state before proceeding to the crash kernel so that the crash kernel
> can have working interrupts.
>
> Track the active SDEI handler per-cpu so that we can COMPLETE_AND_RESUME
> the handler, discarding the interrupted context.

I still argue this is a bug in the firmware. The text you indicate was intended to mean
'set PSTATE.I'. The whole 'GIC abstraction' use-case was added to SDEI quite late. You'll
note linux doesn't support any of that. The normal-world OS was never supposed to have
to guess whether


Regardless, I've not managed to convince the firmware people to fix this.


> diff --git a/arch/arm64/include/asm/sdei.h b/arch/arm64/include/asm/sdei.h
> index 4292d9bafb9d..dc2f038a61ef 100644
> --- a/arch/arm64/include/asm/sdei.h
> +++ b/arch/arm64/include/asm/sdei.h
> @@ -17,6 +17,11 @@
>
> #include <asm/virt.h>
>
> +#ifdef CONFIG_ARM_SDE_INTERFACE
> +DECLARE_PER_CPU(struct sdei_registered_event *, sdei_active_normal_event);
> +DECLARE_PER_CPU(struct sdei_registered_event *, sdei_active_critical_event);
> +#endif

Please drop these #ifdef guards, they prevent using IS_ENABLED() to let the compiler check
the code, but dead-code eliminate it. This makes it harder for the CI to catch any issues
without having to 'get lucky' with the Kconfig options.


> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index ab2a6e33c052..e49f72b5fb63 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -1007,6 +1007,12 @@ SYM_CODE_START(__sdei_asm_handler)
> ldrb w4, [x19, #SDEI_EVENT_PRIORITY]
> #endif
>
> + cbnz w4, 1f

Is this meant to be considering SDEI_EVENT_PRIORITY? That is #ifdef'd for VMAP_STACK/SCS.
Without those options set, you're checking to the stack pointer here.
You probably need to drop the #ifdefs.


> + adr_this_cpu dst=x5, sym=sdei_active_normal_event, tmp=x6
> + b 2f
> +1: adr_this_cpu dst=x5, sym=sdei_active_critical_event, tmp=x6
> +2: str x19, [x5]

Almost every assembly block in here has a comment describing what its doing.
| /* Store the registered-event for crash_smp_send_stop() */


> #ifdef CONFIG_VMAP_STACK
> /*
> * entry.S may have been using sp as a scratch register, find whether
> @@ -1072,6 +1078,13 @@ SYM_CODE_START(__sdei_asm_handler)
>
> ldr_l x2, sdei_exit_mode


| /* Clear the registered-event seen by crash_smp_send_stop() */

> + ldrb w3, [x4, #SDEI_EVENT_PRIORITY]
> + cbnz w3, 1f
> + adr_this_cpu dst=x5, sym=sdei_active_normal_event, tmp=x6
> + b 2f
> +1: adr_this_cpu dst=x5, sym=sdei_active_critical_event, tmp=x6
> +2: str xzr, [x5]
> +
> alternative_if_not ARM64_UNMAP_KERNEL_AT_EL0
> sdei_handler_exit exit_mode=x2
> alternative_else_nop_endif


> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 4e8327264255..ea1595b51590 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c

> @@ -1069,7 +1067,28 @@ void crash_smp_send_stop(void)
> pr_warn("SMP: failed to stop secondary CPUs %*pbl\n",
> cpumask_pr_args(&mask));
>
> +skip_ipi:
> sdei_mask_local_cpu();
> +
> +#ifdef CONFIG_ARM_SDE_INTERFACE

Please make this a C "if(IS_ENABLED(CONFIG_ARM_SDE_INTERFACE)) {", this lets the compiler
check this is all valid, before dead-code eliminating it if the Kconfig option isn't
selected. This lets the CI systems check this code regardless of the Kconfig option.


> + /*
> + * If the crash happened in an SDEI event handler then we need to
> + * finish the handler with the firmware so that we can have working
> + * interrupts in the crash kernel.
> + */
> + if (__this_cpu_read(sdei_active_critical_event)) {
> + pr_warn("SMP: stopped CPUs from SDEI critical event handler "
> + "context, attempting to finish handler.\n");
> + sdei_handler_abort();
> + __this_cpu_write(sdei_active_critical_event, NULL);
> + }
> + if (__this_cpu_read(sdei_active_normal_event)) {
> + pr_warn("SMP: stopped CPUs from SDEI normal event handler "
> + "context, attempting to finish handler.\n");
> + sdei_handler_abort();
> + __this_cpu_write(sdei_active_normal_event, NULL);
> + }
> +#endif

You might want to wrap this up as a sdei_handler_abort(), just to hide the ugly details
from here. The asm version would then need a double-underscore prefix.


> }


With the entry.S #ifdef issue fixed:
Reviewed-by: James Morse <james.morse@xxxxxxx>


Thanks,

James