RE: [PATCH v3 1/2] x86: mce: kexec: switch MCE handler for kexec/kdump

From: Luck, Tony
Date: Tue Mar 03 2015 - 13:09:53 EST


+static void machine_check_under_kdump(struct pt_regs *regs, long error_code)
+{
+ if (mca_cfg.kdump_cpu == smp_processor_id())
+ pr_emerg("MCE triggered when kdumping. If you are lucky enough, you will have a kdump. Otherwise, this is a dying message.\n");

I'm worried about the SRAR case here. Your code just returns, which will trigger the same machine check again. The system will spin forever printing this message.

I think you have to look at MCG_STATUS and scan the machine check banks to make a choice. There are some simple cases:

MCG_STATUS.RIPV=0 -> cannot return (where will the cpu go - you have no idea!)
SRAO -> safe to just return
SRAR -> should not return

But the rest may require some thought. If there is a PCC=1 error, then you may end up with a corrupt dump. Perhaps this case will already be covered by RPIV==0?

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/