RE: [PATCH RFC x86/mce] Make mce_timed_out() identify holdout CPUs

From: Luck, Tony
Date: Wed Jan 06 2021 - 19:27:07 EST


> Please see below for an updated patch.

Yes. That worked:

[ 78.946069] mce: mce_timed_out: MCE holdout CPUs (may include false positives): 24-47,120-143
[ 78.946151] mce: mce_timed_out: MCE holdout CPUs (may include false positives): 24-47,120-143
[ 78.946153] Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler

I guess that more than one CPU hit the timeout and so your new message was printed twice
before the panic code took over?

Once again, the whole of socket 1 is MIA rather than just the pair of threads on one of the cores there.
But that's a useful improvement (eliminating the other three sockets on this system).

Tested-by: Tony Luck <tony.luck@xxxxxxxxx>

-Tony