Re: [PATCH v3 0/9] Parallel CPU bringup for x86_64

From: Tom Lendacky
Date: Thu Dec 16 2021 - 17:52:45 EST


On 12/16/21 1:24 PM, David Woodhouse wrote:
On Thu, 2021-12-16 at 10:27 -0600, Tom Lendacky wrote:
On 12/15/21 8:56 AM, David Woodhouse wrote:
Doing the INIT/SIPI/SIPI in parallel for all APs and *then* waiting for
them shaves about 80% off the AP bringup time on a 96-thread socket
Skylake box (EC2 c5.metal) — from about 500ms to 100ms.

There are more wins to be had with further parallelisation, but this is
the simple part.

I applied this series and began booting a regular non-SEV guest and hit a
failure at 39 vCPUs. No panic or warning, just a reset and OVMF was
executing again. I'll try to debug what's going, but not sure how quickly
I'll arrive at anything.

Thanks for testing. This is working for me with BIOS and EFI boots in
qemu and real hardware but it's mostly been Intel so far. I'll try
harder on an AMD box.

On baremetal, I haven't seen an issue. This only seems to have a problem with Qemu/KVM.

With 191f08997577 I could boot without issues with and without the no_parallel_bringup. Only after I applied e78fa57dd642 did the failure happen.

With e78fa57dd642 I could boot 64 vCPUs pretty consistently, but when I jumped to 128 vCPUs it failed again. When I moved the series to df9726cb7178, then 64 vCPUs also failed pretty consistently.

Strange thing is it is random. Sometimes (rarely) it works on the first boot and then sometimes it doesn't, at which point it will reset and reboot 3 or 4 times and then make it past the failure and fully boot.


Anything else special about your setup, kernel config or qemu
invocation that might help me reproduce?

Shouldn't be anything special that I'm aware of:
- EPYC 3rd Gen (Milan)
- Qemu 6.1.0
- OVMF edk2-stable202111

The qemu command line is:
qemu-system-x86_64 -enable-kvm -cpu EPYC,host-phys-bits=true -smp 128 -m 1G -machine type=q35 -drive if=pflash,format=raw,unit=0,file=/root/kernels/qemu-install/OVMF_CODE.fd,readonly=on -drive if=pflash,format=raw,unit=1,file=./diskless.fd -nographic -kernel /root/kernels/linux-build-x86_64/arch/x86/boot/bzImage -append "console=ttyS0,115200n8" -monitor pty -monitor unix:monitor,server,nowait

I can send the kernel config to you offlist if you're unable to repro with yours.


If it can repro without KVM, 'qemu -d in_asm' can be extremely useful
for this kind of thing btw.

I didn't repro the failure without KVM.

Thanks,
Tom