Re: [patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

From: Thomas Gleixner
Date: Wed Apr 19 2023 - 08:38:30 EST


On Wed, Apr 19 2023 at 11:38, Thomas Gleixner wrote:
> On Tue, Apr 18 2023 at 22:10, Paul Menzel wrote:
>> Am 18.04.23 um 10:40 schrieb Thomas Gleixner:
>>> Can you please provide the output of cpuid?
>>
>> Of course. Here the top, and the whole output is attached.
>
> Thanks for the data. Can you please apply the debug patch below and
> provide the dmesg output? Just the line which is added by the patch is
> enough. You can boot with cpuhp.parallel=off so you don't have wait for
> 10 seconds.

Borislav found some a machine which also refuses to boot. It turns of
the debug patch was spot on:

[ 0.462724] .... node #0, CPUs: #1
[ 0.462731] smpboot: Kicking AP alive: 17
[ 0.465723] #2
[ 0.465732] smpboot: Kicking AP alive: 18
[ 0.467641] #3
[ 0.467641] smpboot: Kicking AP alive: 19

So the kernel gets APICID 17, 18, 19 from ACPI but CPUID leaf 0x1
ebx[31:24], which is the initial APICID has:

CPU1 0x01
CPU2 0x02
CPU3 0x03

Which means the APICID to Linux CPU number lookup based on CPUID 0x01
fails for all of them and stops them dead in the low level startup code.

IOW, the BIOS assignes random numbers to the AP APICs for whatever
raisins, which leaves the parallel startup low level code up a creek
without a paddle, except for actually reading the APICID back from the
APIC. *SHUDDER*

I'm leaning towards disabling the CPUID lead 0x01 based discovery and be
done with it.

Thanks,

tglx