Re: [PATCH v3 05/28] KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to 32-bit ID

From: Alejandro Jimenez
Date: Tue Sep 27 2022 - 23:16:00 EST




On 9/20/2022 7:31 PM, Sean Christopherson wrote:
Truncate the vcpu_id, a.k.a. x2APIC ID, to an 8-bit value when comparing
it against the xAPIC ID to avoid false positives (sort of) on systems
with >255 CPUs, i.e. with IDs that don't fit into a u8. The intent of
APIC_ID_MODIFIED is to inhibit APICv/AVIC when the xAPIC is changed from
it's original value,

The mismatch isn't technically a false positive, as architecturally the
xAPIC IDs do end up being aliased in this scenario, and neither APICv
nor AVIC correctly handles IPI virtualization when there is aliasing.
However, KVM already deliberately does not honor the aliasing behavior
that results when an x2APIC ID gets truncated to an xAPIC ID. I.e. the
resulting APICv/AVIC behavior is aligned with KVM's existing behavior
when KVM's x2APIC hotplug hack is effectively enabled.

If/when KVM provides a way to disable the hotplug hack, APICv/AVIC can
piggyback whatever logic disables the optimized APIC map (which is what
provides the hotplug hack), i.e. so that KVM's optimized map and APIC
virtualization yield the same behavior.

For now, fix the immediate problem of APIC virtualization being disabled
for large VMs, which is a much more pressing issue than ensuring KVM
honors architectural behavior for APIC ID aliasing.

I built a host kernel with this entire series on top of mainline v6.0-rc6, and booting a guest with AVIC enabled works as expected on the initial boot. The issue is that during the first reboot AVIC is inhibited due to APICV_INHIBIT_REASON_APIC_ID_MODIFIED, and I see constant inhibition events due to APICV_INHIBIT_REASON_IRQWIN as seen in the traces:

qemu-system-x86-10147 [222] ..... 1116.519052: kvm_apicv_inhibit_changed: set reason=8, inhibits=0x120
qemu-system-x86-10147 [222] ..... 1116.519063: kvm_apicv_inhibit_changed: cleared reason=8, inhibits=0x20
qemu-system-x86-10147 [222] ..... 1117.934222: kvm_apicv_inhibit_changed: set reason=8, inhibits=0x120
qemu-system-x86-10147 [222] ..... 1117.934233: kvm_apicv_inhibit_changed: cleared reason=8, inhibits=0x20

It happens regardless of vCPU count (tested with 2, 32, 255, 380, and 512 vCPUs). This state persists for all subsequent reboots, until the VM is terminated. For vCPU counts < 256, when x2apic is disabled the problem does not occur, and AVIC continues to work properly after reboots.

I did not see this issue when testing a similar host kernel that did not include this current patchset, but instead applied the earlier:
https://lore.kernel.org/lkml/20220909195442.7660-1-suravee.suthikulpanit@xxxxxxx/
which inspired this [05/23] patch and the follow up [22/28] in this series.

I am using QEMU built from v7.1.0 upstream tag, plus the patch at:
https://lore.kernel.org/qemu-devel/20220504131639.13570-1-suravee.suthikulpanit@xxxxxxx/

Please feel free to request any other data points that might be relevant and I'll try to collect them.

Alejandro

Fixes: 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base")
Reported-by: Suravee Suthikulpanit <suravee.suthikulpanit@xxxxxxx>
Cc: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
---
arch/x86/kvm/lapic.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index adac6ca9b7dc..a02defa3f7b5 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2075,7 +2075,12 @@ static void kvm_lapic_xapic_id_updated(struct kvm_lapic *apic)
if (KVM_BUG_ON(apic_x2apic_mode(apic), kvm))
return;
- if (kvm_xapic_id(apic) == apic->vcpu->vcpu_id)
+ /*
+ * Deliberately truncate the vCPU ID when detecting a modified APIC ID
+ * to avoid false positives if the vCPU ID, i.e. x2APIC ID, is a 32-bit
+ * value.
+ */
+ if (kvm_xapic_id(apic) == (u8)apic->vcpu->vcpu_id)
return;
kvm_set_apicv_inhibit(apic->vcpu->kvm, APICV_INHIBIT_REASON_APIC_ID_MODIFIED);