Re: [PATCH] x86/mce: Schedule mce_setup() on correct CPU for CPER decoding

From: Yazen Ghannam
Date: Thu Jun 15 2023 - 11:34:34 EST


On 6/15/2023 11:20 AM, Borislav Petkov wrote:
On Mon, Apr 17, 2023 at 04:20:06PM +0000, Yazen Ghannam wrote:
@@ -97,20 +102,13 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
if (ctx_info->reg_arr_size < 48)
return -EINVAL;
- mce_setup(&m);
-
- m.extcpu = -1;
- m.socketid = -1;
-
- for_each_possible_cpu(cpu) {
- if (cpu_data(cpu).initial_apicid == lapic_id) {
- m.extcpu = cpu;
- m.socketid = cpu_data(m.extcpu).phys_proc_id;
+ for_each_possible_cpu(cpu)
+ if (cpu_data(cpu).initial_apicid == lapic_id)
break;
- }
- }
- m.apicid = lapic_id;
+ if (smp_call_function_single(cpu, __mce_setup, &m, 1))

I can see the following call-chain from NMI context which is a no-no:

ghes_notify_nmi
|-> ghes_in_nmi_spool_from_list
|-> ghes_in_nmi_queue_one_entry
|-> __ghes_panic
|-> __ghes_print_estatus
|-> cper_estatus_print
|-> cper_estatus_print_section
|-> cper_print_proc_ia
|-> arch_apei_report_x86_error
|-> apei_smca_report_x86_error
|-> smp_call_function_single



Right, but in practice SMCA errors are not reported through GHES at runtime. They will only come in through BERT at boot time. There aren't any plans to change this, so the NMI issue won't be encountered.

I can include this info in the commit message and/or code comments. Is this okay?

We can solve the NMI issue if it ever comes up in the future. Unless there's an obvious change to avoid this now. Any suggestions?

Thanks,
Yazen