Re: [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs

From: Kim Phillips
Date: Fri Feb 03 2023 - 14:48:53 EST


+Mario

Hi,

On 2/2/23 3:56 PM, Usama Arif wrote:
From: David Woodhouse <dwmw@xxxxxxxxxxxx>

Signed-off-by: David Woodhouse <dwmw@xxxxxxxxxxxx>
---

I'd like to nack this, but can't (and not because it doesn't have
commit text):

If I:

- take dwmw2's parallel-6.2-rc6 branch (commit 459d1c46dbd1)
- remove the set_cpu_bug(c, X86_BUG_NO_PARALLEL_BRINGUP) line from amd.c

Then:

- a Ryzen 3000 (Picasso A1/Zen+) notebook I have access to fails to boot.
- Zen 2,3,4-based servers boot fine
- a Zen1-based server doesn't boot.

This is what's left on its serial port:

[ 3.199633] smp: Bringing up secondary CPUs ...
[ 3.200732] x86: Booting SMP configuration:
[ 3.204242] .... node #0, CPUs: #1
[ 3.204301] CPU 1 to 93/x86/cpu:kick in 63 21 -114014307645 0 . 0 0 0 0 . 0 114025055970
[ 3.204478] ------------[ cut here ]------------
[ 3.204481] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
[ 3.204490] Modules linked in:
[ 3.204493] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc6+ #19
[ 3.204496] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
[ 3.204498] RIP: 0010:cpu_init+0x2d/0x1f0
[ 3.204502] Code: e5 41 56 41 55 41 54 53 65 48 8b 1c 25 80 2e 1f 00 65 44 8b 35 20 e4 39 55 48 8b 05 5d f7 51 02 44 89 f2 f0 48 0f ab 10 73 06 <0f> 0b eb 02 f3 90 48 8b 05 3e f7 51 02 48 0f a3 10 73 f1 45 85 f6
[ 3.204504] RSP: 0000:ffffffffac803d70 EFLAGS: 00010083
[ 3.204506] RAX: ffff8d293eef6e40 RBX: ffff8d1d40010000 RCX: 0000000000000008
[ 3.204508] RDX: 0000000000000000 RSI: ffff8d1d1c40b048 RDI: ffffffffac566418
[ 3.204509] RBP: ffffffffac803d90 R08: 00000000fffffe14 R09: ffff8d1d1c406078
[ 3.204510] R10: ffffffffac803dc0 R11: 0000000000000000 R12: 0000000000000000
[ 3.204511] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 3.204512] FS: 0000000000000000(0000) GS:ffff8d1d1c400000(0000) knlGS:0000000000000000
[ 3.204514] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.204515] CR2: 0000000000000000 CR3: 0000800daec12000 CR4: 00000000003100a0
[ 3.204517] Call Trace:
[ 3.204519] ---[ end trace 0000000000000000 ]---
[ 3.204580] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0 APIC: 2
[ 3.288686] #2
[ 3.288735] CPU 2 to 93/x86/cpu:kick in 210 42 -114355248756 0 . 0 0 0 0 . 0 114356192013
[ 3.288798] ------------[ cut here ]------------
[ 3.288804] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
[ 3.288815] Modules linked in:
[ 3.288819] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.2.0-rc6+ #19
[ 3.288823] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
[ 3.288826] RIP: 0010:cpu_init+0x2d/0x1f0
[ 3.288831] Code: e5 41 56 41 55 41 54 53 65 48 8b 1c 25 80 2e 1f 00 65 44 8b 35 20 e4 39 55 48 8b 05 5d f7 51 02 44 89 f2 f0 48 0f ab 10 73 06 <0f> 0b eb 02 f3 90 48 8b 05 3e f7 51 02 48 0f a3 10 73 f1 45 85 f6
[ 3.288835] RSP: 0000:ffffffffac803d70 EFLAGS: 00010083
[ 3.288838] RAX: ffff8d293eef6e40 RBX: ffff8d1d40010000 RCX: 0000000000000008
[ 3.288841] RDX: 0000000000000000 RSI: ffff8d1d1c40b048 RDI: ffffffffac566418
[ 3.288844] RBP: ffffffffac803d90 R08: 00000000fffffe14 R09: ffff8d1d1c406078
[ 3.288845] R10: ffffffffac803dc0 R11: 0000000000000000 R12: 0000000000000000
[ 3.288848] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 3.288850] FS: 0000000000000000(0000) GS:ffff8d1d1c400000(0000) knlGS:0000000000000000
[ 3.288852] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.288855] CR2: 0000000000000000 CR3: 0000800daec12000 CR4: 00000000003100a0
[ 3.288857] Call Trace:
[ 3.288859] ---[ end trace 0000000000000000 ]---
[ 3.288925] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0 APIC: 8
6.36[ [ 3. 68 33]3 [ #3[ [ #
[ 3.368623[ 3
[ 3.368623] #3
[ 3.368662] ------------[ cut here ]------------
[ 3.368673] CPU 3 to 93/x86/cpu:kick in 504 315 -114684508974 0 . 0 0 0 0 . 0 114685353594
[ 3.368705] BUG: scheduling while atomic: swapper/0/1/0x00000003
[ 3.368708] 7 locks held by swapper/0/1:
[ 3.368710] #0: ffffffffacbff920 (console_lock){....}-{0:0}, at: vprintk_emit+0x13a/0x2e0
[ 3.368721] #1: ffffffffacbffd48 (console_srcu){....}-{0:0}, at: console_flush_all+0x2d/0x250
[ 3.368728] #2: ffffffffac87f540 (console_owner){....}-{0:0}, at: console_emit_next_record.constprop.22+0x189/0x350
[ 3.368735] #3: ffffffffadaae838 (&port_lock_key){....}-{2:2}, at: serial8250_console_write+0x88/0x3c0
[ 3.368745] #4: ffffffffac86aa50 (cpu_add_remove_lock){....}-{3:3}, at: cpu_up+0x6a/0xd0
[ 3.368753] #5: ffffffffac86a9a0 (cpu_hotplug_lock){....}-{0:0}, at: _cpu_up+0x3d/0x2f0
[ 3.368760] #6: ffffffffac8763b0 (smpboot_threads_lock){....}-{3:3}, at: smpboot_create_threads+0x21/0x80
[ 3.368769] Modules linked in:
[ 3.368770] Preemption disabled at:
[ 3.368771] [<ffffffffaae717a4>] do_cpu_up+0x3e4/0x780
[ 3.368777] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.2.0-rc6+ #19
[ 3.368781] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
[ 3.368782] Call Trace:
[ 3.368783] <TASK>
[ 3.368789] dump_stack_lvl+0x49/0x63
[ 3.368795] ? do_cpu_up+0x3e4/0x780
[ 3.368799] dump_stack+0x10/0x16
[ 3.368802] __schedule_bug+0xad/0xd0
[ 3.368808] __schedule+0x76/0x8a0
[ 3.368812] ? sched_clock+0x9/0x10
[ 3.368817] ? sched_clock_local+0x17/0x90
[ 3.368826] ? sort_range+0x30/0x30
[ 3.368830] schedule+0x88/0xd0
[ 3.368833] schedule_timeout+0x40/0x320
[ 3.368840] ? __this_cpu_preempt_check+0x13/0x20
[ 3.368844] ? lock_release+0x353/0x3c0
[ 3.368852] ? sort_range+0x30/0x30
[ 3.368856] wait_for_completion_killable+0xe0/0x1c0
[ 3.368864] __kthread_create_on_node+0xfe/0x1e0
[ 3.368876] ? wait_for_completion_killable+0x38/0x1c0
[ 3.368884] kthread_create_on_node+0x46/0x70
[ 3.368894] kthread_create_on_cpu+0x2c/0x90
[ 3.368899] __smpboot_create_thread+0x87/0x140
[ 3.368905] smpboot_create_threads+0x3f/0x80
[ 3.368909] ? idle_thread_get+0x40/0x40
[ 3.368913] cpuhp_invoke_callback+0x13c/0x5d0
[ 3.368921] __cpuhp_invoke_callback_range+0x69/0xf0
[ 3.368929] _cpu_up+0x12a/0x2f0
[ 3.368937] cpu_up+0x8f/0xd0
[ 3.368942] bringup_nonboot_cpus+0x7c/0x160
[ 3.368950] smp_init+0x2a/0x83
[ 3.368957] kernel_init_freeable+0x1a1/0x309
[ 3.368961] ? lock_release+0x353/0x3c0
[ 3.368972] ? rest_init+0x140/0x140
[ 3.368977] kernel_init+0x1a/0x130
[ 3.368980] ret_from_fork+0x22/0x30
[ 3.368996] </TASK>
[ 3.369419]
[ 3.369420] .... node #1, CPUs: #4
[ 3.369466] ------------[ cut here ]------------
[ 3.369469] CPU 4 to 93/x86/cpu:kick in 378 42 -114685407543 0 . 0 0 0 0 . 0 114687022569
[ 3.369474] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
[ 3.369487] Modules linked in:
[ 3.369491] ------------[ cut here ]------------
[ 3.369494] DEBUG_LOCKS_WARN_ON(val > preempt_count())
[ 3.369493] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.2.0-rc6+ #19
[ 3.369499] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018


...which points to the WARN_ON here:

static void wait_for_master_cpu(int cpu)
{
#ifdef CONFIG_SMP
/*
* wait for ACK from master CPU before continuing
* with AP initialization
*/
WARN_ON(cpumask_test_and_set_cpu(cpu, cpu_initialized_mask));
while (!cpumask_test_cpu(cpu, cpu_callout_mask))
cpu_relax();
#endif
}

Let me know if you'd like me to test any changes.

Thanks,

Kim