Re: [RFC PATCH] KVM: x86: inhibit APICv upon detecting direct APIC access from L2

From: Ake Koomsin
Date: Thu Aug 24 2023 - 23:59:19 EST


On Wed Aug 9, 2023 at 8:48 AM JST, Sean Christopherson wrote:
> Thank you for the detailed repro steps! However, it's likely going to be O(weeks)
> before anyone is able to look at this in detail given the extensive repro steps.
> If you have bandwidth, it's probably worth trying to reproduce the problem in a
> KVM selftest (or a KVM-Unit-Test), e.g. create a nested VM, send an IPI from L2,
> and see if it gets routed correctly. This purely a suggestion to try and get a
> faster fix, it's by no means necessary.

Hi

I have tried KVM Unit Test and want to report back the result.

Note 1: BitVisor does not let L2 guest see any KVM features in CPUID. It aims
to run on real hardware. The L2 guest will not be aware that it runs on a
hypervisor.

Note 2: BitVisor stops monitoring APIC access once it detects INIT in APIC
ICR write. bringup_aps() in lib/x86/smp.c unconditionally does INIT and SIPI
even though SMP=1. This is actually good to test direct physical APIC access
from L2 guest I think. BitVisor stops monitoring APIC access after
bringup_aps() is called. No APIC access goes to L1 BitVisor after this.

=== Procedure to chain-loading BitVisor and apic.efi from the unit test ===

1) 'hg clone http://hg.code.sf.net/p/bitvisor/code bitvisor'

2) Compile BitVisor by running 'make' command. The default config is ok.
'bitvisor.elf' is created at the project root directory after compilation
is done.

3) Apply the following patch to BitVisor. This is to make loadvmm.efi load
BitVisor following by apic.efi

-------------------------------------------------------------------------------
diff --git a/boot/uefi-loader/loadvmm.c b/boot/uefi-loader/loadvmm.c
--- a/boot/uefi-loader/loadvmm.c
+++ b/boot/uefi-loader/loadvmm.c
@@ -212,5 +212,47 @@ efi_main (EFI_HANDLE image, EFI_SYSTEM_T
file->Close (file);
if (!boot_error)
return EFI_LOAD_ERROR;
+
+ static CHAR16 apic_path[4096];
+ EFI_HANDLE apic_image;
+ UINT32 npages;
+ create_file_path (loaded_image->FilePath, L"apic.efi", apic_path,
+ sizeof apic_path / sizeof apic_path[0]);
+ status = fileio->OpenVolume (fileio, &file);
+ if (EFI_ERROR (status)) {
+ print (systab, L"OpenVolume ", status);
+ return status;
+ }
+ status = file->Open (file, &file2, apic_path, EFI_FILE_MODE_READ, 0);
+ if (EFI_ERROR (status)) {
+ print (systab, L"Open ", status);
+ return status;
+ }
+ /* apic.efi is about 1.2MB at the time of test, ~300 pages */
+ npages = 300;
+ status = systab->BootServices->AllocatePages (AllocateMaxAddress,
+ EfiLoaderData, npages,
+ &paddr);
+ if (EFI_ERROR (status)) {
+ print (systab, L"AllocatePages ", status);
+ return status;
+ }
+ readsize = npages * 4096;
+ status = file2->Read (file2, &readsize, (void *)paddr);
+ if (EFI_ERROR (status)) {
+ print (systab, L"Read ", status);
+ return status;
+ }
+ status = systab->BootServices->LoadImage (TRUE, image, NULL, (void *)paddr, readsize, &apic_image);
+ if (EFI_ERROR (status)) {
+ print (systab, L"LoadImage ", status);
+ return status;
+ }
+ status = systab->BootServices->StartImage (apic_image, NULL, NULL);
+ if (EFI_ERROR (status)) {
+ print (systab, L"StartImage ", status);
+ return status;
+ }
+
return EFI_SUCCESS;
}
-------------------------------------------------------------------------------

4) Change directory to /path/to/bitvsor/boot/uefi-loader. Compile 'loadvmm.efi'
by running 'make' command. Mingw64 is required to compiled the loader.
Modify loadvmm.efi's Makefile to set your 'EXE_CC' if necessary.

5) Apply the following patch to KVM Unit Test code to copy the above
loadvmm.efi as BOOTX64.EFI and make sure that bitvisor.elf and apic.efi are
in the same folder as loadvmm.efi. The patch is dirty but it gets the job
done. Replace '/path/to/loadvmm.efi' and '/path/to/bitvisor.elf' to match
your testing environment.

-------------------------------------------------------------------------------
diff --git a/x86/efi/run b/x86/efi/run
index 85aeb94..fefb3cc 100755
--- a/x86/efi/run
+++ b/x86/efi/run
@@ -42,6 +42,10 @@ fi

mkdir -p "$EFI_CASE_DIR"
cp "$EFI_SRC/$EFI_CASE.efi" "$EFI_CASE_BINARY"
+cp "/path/to/loadvmm.efi" "$EFI_CASE_BINARY"
+cp "/path/to/bitvisor.elf" "$EFI_CASE_DIR/"
+cp "$EFI_SRC/$EFI_CASE.efi" "$EFI_CASE_DIR/$EFI_CASE.efi"

# Run test case with 256MiB QEMU memory. QEMU default memory size is 128MiB.
# After UEFI boot up and we call `LibMemoryMap()`, the largest consecutive
-------------------------------------------------------------------------------

6) The following bad hack is probably needed to avoid stall when testing with
EFI_SMP > 1

-------------------------------------------------------------------------------
diff --git a/lib/x86/smp.c b/lib/x86/smp.c
index b9b91c7..ba74321 100644
--- a/lib/x86/smp.c
+++ b/lib/x86/smp.c
@@ -279,6 +279,9 @@ void bringup_aps(void)
/* INIT */
apic_icr_write(APIC_DEST_ALLBUT | APIC_DEST_PHYSICAL | APIC_DM_INIT | APIC_INT_ASSERT, 0);

+ for(int i = 0; i < 30000000; i++)
+ cpu_relax();
+
/* SIPI */
apic_icr_write(APIC_DEST_ALLBUT | APIC_DEST_PHYSICAL | APIC_DM_STARTUP, 0);

-------------------------------------------------------------------------------

7) Compile KVM Unit Test with EFI enabled and run KVM Unit Test with the
following command:

./x86/efi/run apic.efi -cpu host -m 2048M

The following section is the report from testing on my machine

CPU: 13th Gen Intel i5-13600K (20) @ 5.100GHz
Kernel: Latest kvm.git, default config
QEMU Version: 8.0.4

=== enable_apicv=N and EFI_SMP=1 report ===

BdsDxe: loading Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x3,0x0)
BdsDxe: starting Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x3,0x0)
Loading ...............................................................
Starting BitVisor...
Copyright (c) 2007, 2008 University of Tsukuba
All rights reserved.
ACPI DMAR not found.
FACS address 0x7FBDD000
Module not found.
Processor 0 (BSP)
oooooooooooooooooooooooooooooooooooooooooooooooooo
Disable ACPI S3
Using VMX.
Processor 0 3494489584 Hz
Loading drivers.
AES/AES-XTS Encryption Engine initialized (AES=openssl)
Copyright (c) 1998-2002 The OpenSSL Project. All rights reserved.
Generic ATA/ATAPI para pass-through driver 0.4 registered
Generic AHCI para pass-through driver registered
Generic RAID para pass-through driver registered
Generic IEEE1394 para pass-through driver 0.1 registered
Aquantia AQC107 Ethernet Driver registered
Broadcom NetXtreme Gigabit Ethernet Driver registered
VPN for Intel PRO/100 registered
Intel PRO/1000 driver registered
Realtek Ethernet Driver registered
virtio-net virtual driver registered
NVMe para pass-through driver registered
NVMe para pass-through driver registered
PCI device concealer registered
PCI device monitor registered
Generic EHCI para pass-through driver 0.9 registered
Generic EHCI para pass-through driver 0.9 registered
Generic UHCI para pass-through driver 1.0 registered
xHCI para pass-through driver 0.1 registered
Intel Corporation Ethernet Controller 10 Gigabit X540 Driver registered
PCI: finding devices...
PCI: 6 devices found
Starting a virtual machine.
enabling apic
smp: waiting for 0 APs
Address of image is: 0x7e6b7000
paging enabled
cr0 = 80010021
cr3 = 153f000
BdsDxe: loading Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x3,0x0)
BdsDxe: starting Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x3,0x0)
Loading ...............................................................
Starting BitVisor...
Copyright (c) 2007, 2008 University of Tsukuba
All rights reserved.
ACPI DMAR not found.
FACS address 0x7FBDD000
Module not found.
Processor 0 (BSP)
oooooooooooooooooooooooooooooooooooooooooooooooooo
Disable ACPI S3
Using VMX.
Processor 0 3494527968 Hz
Loading drivers.
AES/AES-XTS Encryption Engine initialized (AES=openssl)
Copyright (c) 1998-2002 The OpenSSL Project. All rights reserved.
Generic ATA/ATAPI para pass-through driver 0.4 registered
Generic AHCI para pass-through driver registered
Generic RAID para pass-through driver registered
Generic IEEE1394 para pass-through driver 0.1 registered
Aquantia AQC107 Ethernet Driver registered
Broadcom NetXtreme Gigabit Ethernet Driver registered
VPN for Intel PRO/100 registered
Intel PRO/1000 driver registered
Realtek Ethernet Driver registered
virtio-net virtual driver registered
NVMe para pass-through driver registered
NVMe para pass-through driver registered
PCI device concealer registered
PCI device monitor registered
Generic EHCI para pass-through driver 0.9 registered
Generic EHCI para pass-through driver 0.9 registered
Generic UHCI para pass-through driver 1.0 registered
xHCI para pass-through driver 0.1 registered
Intel Corporation Ethernet Controller 10 Gigabit X540 Driver registered
PCI: finding devices...
PCI: 6 devices found
Starting a virtual machine.
enabling apic
smp: waiting for 0 APs
Address of image is: 0x7e6b7000
paging enabled
cr0 = 80010021
cr3 = 153f000
cr4 = 628
apic version: 14
PASS: apic existence
PASS: apic_disable: Local apic disabled
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is clear
PASS: apic_disable: *0xfee00030: ffffffff
PASS: apic_disable: CR8: 0
PASS: apic_disable: CR8: f
PASS: apic_disable: *0xfee00080: ffffffff
PASS: apic_disable: Local apic enabled in xAPIC mode
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set
PASS: apic_disable: *0xfee00030: 50014
PASS: apic_disable: *0xfee00080: 0
PASS: apic_disable: *0xfee00080: f0
PASS: apic_disable: Local apic enabled in x2APIC mode
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set
PASS: apic_disable: *0xfee00030: ffffffff
PASS: apic_disable: CR8: 0
PASS: apic_disable: CR8: f
PASS: apic_disable: *0xfee00080: ffffffff
x2apic enabled
PASS: x2apic enabled to invalid state
PASS: x2apic enabled to apic enabled
PASS: x2apic enabled to disabled state
PASS: disabled to invalid state
PASS: disabled to x2apic enabled
PASS: apic disabled to apic enabled
PASS: apic enabled to invalid state
PASS: self_ipi_xapic: Local apic enabled in xAPIC mode
PASS: self_ipi_xapic: self ipi
PASS: self_ipi_x2apic: Local apic enabled in x2APIC mode
PASS: self_ipi_x2apic: self ipi
starting broadcast (x2apic)
PASS: APIC physical broadcast address
PASS: APIC physical broadcast shorthand
PASS: PV IPIs testing
PASS: pending nmi
PASS: APIC LVT timer one shot
starting apic change mode
PASS: TMICT value reset
PASS: TMCCT should have a non-zero value
PASS: TMCCT should have reached 0
PASS: TMCCT should have a non-zero value
PASS: TMCCT should not be reset to TMICT value
PASS: TMCCT should be reset to the initial-count
PASS: TMCCT should not be reset to init
PASS: TMCCT should have reach zero
PASS: TMCCT should stay at zero
PASS: tsc deadline timer
PASS: tsc deadline timer clearing
PASS: apicbase: relocate apic
PASS: apicbase: reserved physaddr bits
PASS: apicbase: reserved low bits
SUMMARY: 48 tests

=== enable_apicv=N and EFI_SMP=2 report ===

BdsDxe: loading Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x3,0x0)
BdsDxe: starting Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x3,0x0)
Loading ...............................................................
Starting BitVisor...
Copyright (c) 2007, 2008 University of Tsukuba
All rights reserved.
ACPI DMAR not found.
FACS address 0x7FBDD000
Module not found.
Processor 0 (BSP)
oooooooooooooooooooooooooooooooooooooooooooooooooo
Disable ACPI S3
Using VMX.
Processor 0 3494511056 Hz
Loading drivers.
AES/AES-XTS Encryption Engine initialized (AES=openssl)
Copyright (c) 1998-2002 The OpenSSL Project. All rights reserved.
Generic ATA/ATAPI para pass-through driver 0.4 registered
Generic AHCI para pass-through driver registered
Generic RAID para pass-through driver registered
Generic IEEE1394 para pass-through driver 0.1 registered
Aquantia AQC107 Ethernet Driver registered
Broadcom NetXtreme Gigabit Ethernet Driver registered
VPN for Intel PRO/100 registered
Intel PRO/1000 driver registered
Realtek Ethernet Driver registered
virtio-net virtual driver registered
NVMe para pass-through driver registered
NVMe para pass-through driver registered
PCI device concealer registered
PCI device monitor registered
Generic EHCI para pass-through driver 0.9 registered
Generic EHCI para pass-through driver 0.9 registered
Generic UHCI para pass-through driver 1.0 registered
xHCI para pass-through driver 0.1 registered
Intel Corporation Ethernet Controller 10 Gigabit X540 Driver registered
PCI: finding devices...
PCI: 6 devices found
Starting a virtual machine.
enabling apic
smp: waiting for 1 APs
... Likely need bad hack in step 6 to continue ...
enabling apic
setup: CPU 1 online
Address of image is: 0x7e6b9000
paging enabled
cr0 = 80010021
cr3 = 153f000
cr4 = 628
apic version: 14
PASS: apic existence
PASS: apic_disable: Local apic disabled
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is clear
PASS: apic_disable: *0xfee00030: ffffffff
PASS: apic_disable: CR8: 0
PASS: apic_disable: CR8: f
PASS: apic_disable: *0xfee00080: ffffffff
PASS: apic_disable: Local apic enabled in xAPIC mode
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set
PASS: apic_disable: *0xfee00030: 50014
PASS: apic_disable: *0xfee00080: 0
PASS: apic_disable: *0xfee00080: f0
PASS: apic_disable: Local apic enabled in x2APIC mode
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set
PASS: apic_disable: *0xfee00030: ffffffff
PASS: apic_disable: CR8: 0
PASS: apic_disable: CR8: f
PASS: apic_disable: *0xfee00080: ffffffff
x2apic enabled
PASS: x2apic enabled to invalid state
PASS: x2apic enabled to apic enabled
PASS: x2apic enabled to disabled state
PASS: disabled to invalid state
PASS: disabled to x2apic enabled
PASS: apic disabled to apic enabled
PASS: apic enabled to invalid state
PASS: self_ipi_xapic: Local apic enabled in xAPIC mode
PASS: self_ipi_xapic: self ipi
PASS: self_ipi_x2apic: Local apic enabled in x2APIC mode
PASS: self_ipi_x2apic: self ipi
starting broadcast (x2apic)
PASS: APIC physical broadcast address
PASS: APIC physical broadcast shorthand
PASS: PV IPIs testing
PASS: nmi-after-sti
FAIL: multiple nmi
PASS: pending nmi
PASS: APIC LVT timer one shot
starting apic change mode
PASS: TMICT value reset
PASS: TMCCT should have a non-zero value
PASS: TMCCT should have reached 0
PASS: TMCCT should have a non-zero value
PASS: TMCCT should not be reset to TMICT value
PASS: TMCCT should be reset to the initial-count
PASS: TMCCT should not be reset to init
PASS: TMCCT should have reach zero
PASS: TMCCT should stay at zero
PASS: tsc deadline timer
PASS: tsc deadline timer clearing
PASS: xapic id matches cpuid
PASS: writeable xapic id
PASS: non-writeable x2apic id
PASS: sane x2apic id
PASS: x2apic id matches cpuid
PASS: correct xapic id after reset
PASS: apicbase: relocate apic
PASS: apicbase: reserved physaddr bits
PASS: apicbase: reserved low bits
SUMMARY: 56 tests, 1 unexpected failures

=== enable_apicv=Y and EFI_SMP=1 report ===

BdsDxe: loading Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x3,0x0)
BdsDxe: starting Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x3,0x0)
Loading ...............................................................
Starting BitVisor...
Copyright (c) 2007, 2008 University of Tsukuba
All rights reserved.
ACPI DMAR not found.
FACS address 0x7FBDD000
Module not found.
Processor 0 (BSP)
oooooooooooooooooooooooooooooooooooooooooooooooooo
Disable ACPI S3
Using VMX.
Processor 0 3494569088 Hz
Loading drivers.
AES/AES-XTS Encryption Engine initialized (AES=openssl)
Copyright (c) 1998-2002 The OpenSSL Project. All rights reserved.
Generic ATA/ATAPI para pass-through driver 0.4 registered
Generic AHCI para pass-through driver registered
Generic RAID para pass-through driver registered
Generic IEEE1394 para pass-through driver 0.1 registered
Aquantia AQC107 Ethernet Driver registered
Broadcom NetXtreme Gigabit Ethernet Driver registered
VPN for Intel PRO/100 registered
Intel PRO/1000 driver registered
Realtek Ethernet Driver registered
virtio-net virtual driver registered
NVMe para pass-through driver registered
NVMe para pass-through driver registered
PCI device concealer registered
PCI device monitor registered
Generic EHCI para pass-through driver 0.9 registered
Generic EHCI para pass-through driver 0.9 registered
Generic UHCI para pass-through driver 1.0 registered
xHCI para pass-through driver 0.1 registered
Intel Corporation Ethernet Controller 10 Gigabit X540 Driver registered
PCI: finding devices...
PCI: 6 devices found
Starting a virtual machine.
enabling apic
smp: waiting for 0 APs
Address of image is: 0x7e6c0000
paging enabled
cr0 = 80010021
cr3 = 153f000
cr4 = 628
apic version: 14
PASS: apic existence
PASS: apic_disable: Local apic disabled
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is clear
PASS: apic_disable: *0xfee00030: ffffffff
PASS: apic_disable: CR8: 0
PASS: apic_disable: CR8: f
PASS: apic_disable: *0xfee00080: ffffffff
PASS: apic_disable: Local apic enabled in xAPIC mode
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set
PASS: apic_disable: *0xfee00030: 50014
PASS: apic_disable: *0xfee00080: 0
PASS: apic_disable: *0xfee00080: f0
PASS: apic_disable: Local apic enabled in x2APIC mode
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set
PASS: apic_disable: *0xfee00030: ffffffff
PASS: apic_disable: CR8: 0
PASS: apic_disable: CR8: f
PASS: apic_disable: *0xfee00080: ffffffff
x2apic enabled
PASS: x2apic enabled to invalid state
PASS: x2apic enabled to apic enabled
PASS: x2apic enabled to disabled state
PASS: disabled to invalid state
PASS: disabled to x2apic enabled
PASS: apic disabled to apic enabled
PASS: apic enabled to invalid state
PASS: self_ipi_xapic: Local apic enabled in xAPIC mode
PASS: self_ipi_xapic: self ipi
PASS: self_ipi_x2apic: Local apic enabled in x2APIC mode
FAIL: self_ipi_x2apic: self ipi
starting broadcast (x2apic)
FAIL: APIC physical broadcast address
FAIL: APIC physical broadcast shorthand
PASS: PV IPIs testing
PASS: pending nmi
...Stall...

=== enable_apicv=Y and EFI_SMP=2 report ===

BdsDxe: loading Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x4,0x0)
BdsDxe: starting Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x4,0x0)
Loading ...............................................................
Starting BitVisor...
Copyright (c) 2007, 2008 University of Tsukuba
All rights reserved.
ACPI DMAR not found.
FACS address 0x7FBDD000
Module not found.
Processor 0 (BSP)
oooooooooooooooooooooooooooooooooooooooooooooooooo
Disable ACPI S3
Using VMX.
Processor 0 3494574368 Hz
Loading drivers.
AES/AES-XTS Encryption Engine initialized (AES=openssl)
Copyright (c) 1998-2002 The OpenSSL Project. All rights reserved.
Generic ATA/ATAPI para pass-through driver 0.4 registered
Generic AHCI para pass-through driver registered
Generic RAID para pass-through driver registered
Generic IEEE1394 para pass-through driver 0.1 registered
Aquantia AQC107 Ethernet Driver registered
Broadcom NetXtreme Gigabit Ethernet Driver registered
VPN for Intel PRO/100 registered
Intel PRO/1000 driver registered
Realtek Ethernet Driver registered
virtio-net virtual driver registered
NVMe para pass-through driver registered
NVMe para pass-through driver registered
PCI device concealer registered
PCI device monitor registered
Generic EHCI para pass-through driver 0.9 registered
Generic EHCI para pass-through driver 0.9 registered
Generic UHCI para pass-through driver 1.0 registered
xHCI para pass-through driver 0.1 registered
Intel Corporation Ethernet Controller 10 Gigabit X540 Driver registered
PCI: finding devices...
PRO/1000 found.
MAC Address: 52:54:00:12:34:56
core_io_unregister_handler: port: c180-c19f
Uninstall 2 protocol(s) successfully
[0000:00:02.0] Disconnected PCI device drivers
Wait for PHY reset and link setup completion.
PCI: 7 devices found
Starting a virtual machine.
enabling apic
smp: waiting for 1 APs
... Likely need bad hack in step 6 to continue ...
enabling apic
setup: CPU 1 online
Address of image is: 0x7e461000
paging enabled
cr0 = 80010021
cr3 = 153f000
cr4 = 628
...Stall...

When enable_apicv=N, it can complete the test while with enable_apicv=Y, it
cannot.

I am not sure if I violate any assumption KVM Unit Test made by doing this
experiment. However, I think it is worth reporting.

Feel free to ask me if there are problems when trying to reproduce the
experiment or you need more info.


Best Regards
Ake Koomsin