Re: Re: [RFC] KVM: x86: SVM: don't expose PV_SEND_IPI feature with AVIC

From: zhenwei pi
Date: Mon Nov 08 2021 - 06:16:49 EST


On 11/8/21 7:08 PM, Maxim Levitsky wrote:
On Mon, 2021-11-08 at 11:30 +0100, Paolo Bonzini wrote:
On 11/8/21 10:59, Kele Huang wrote:
Currently, AVIC is disabled if x2apic feature is exposed to guest
or in-kernel PIT is in re-injection mode.

We can enable AVIC with options:

Kmod args:
modprobe kvm_amd avic=1 nested=0 npt=1
QEMU args:
... -cpu host,-x2apic -global kvm-pit.lost_tick_policy=discard ...

When LAPIC works in xapic mode, both AVIC and PV_SEND_IPI feature
can accelerate IPI operations for guest. However, the relationship
between AVIC and PV_SEND_IPI feature is not sorted out.

In logical, AVIC accelerates most of frequently IPI operations
without VMM intervention, while the re-hooking of apic->send_IPI_xxx
from PV_SEND_IPI feature masks out it. People can get confused
if AVIC is enabled while getting lots of hypercall kvm_exits
from IPI.

In performance, benchmark tool
https://lore.kernel.org/kvm/20171219085010.4081-1-ynorov@xxxxxxxxxxxxxxxxxx/
shows below results:

Test env:
CPU: AMD EPYC 7742 64-Core Processor
2 vCPUs pinned 1:1
idle=poll

Test result (average ns per IPI of lots of running):
PV_SEND_IPI : 1860
AVIC : 1390

Besides, disscussions in https://lkml.org/lkml/2021/10/20/423
do have some solid performance test results to this.

This patch fixes this by masking out PV_SEND_IPI feature when
AVIC is enabled in setting up of guest vCPUs' CPUID.

Signed-off-by: Kele Huang <huangkele@xxxxxxxxxxxxx>

AVIC can change across migration. I think we should instead use a new
KVM_HINTS_* bit (KVM_HINTS_ACCELERATED_LAPIC or something like that).
The KVM_HINTS_* bits are intended to be changeable across migration,
even though we don't have for now anything equivalent to the Hyper-V
reenlightenment interrupt.

Note that the same issue exists with HyperV. It also has PV APIC,
which is harmful when AVIC is enabled (that is guest uses it instead
of using AVIC, negating AVIC benefits).

Also note that Intel recently posted IPI virtualizaion, which
will make this issue relevant to APICv too soon.

I don't yet know if there is a solution to this which doesn't
involve some management software decision (e.g libvirt or higher).

Best regards,
Maxim Levitsky


For QEMU, "-cpu host,kvm-pv-ipi=off" can disable kvm-pv-ipi.
And for libvirt, I posted a patch to disable kvm-pv-ipi by libvirt xml, link:
https://github.com/libvirt/libvirt/commit/b2757b697e29fa86972a4638a5879dccc8add2ad


Paolo

---
arch/x86/kvm/cpuid.c | 4 ++--
arch/x86/kvm/svm/svm.c | 13 +++++++++++++
2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 2d70edb0f323..cc22975e2ac5 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -194,8 +194,6 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
best->ecx |= XFEATURE_MASK_FPSSE;
}
- kvm_update_pv_runtime(vcpu);
-
vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
vcpu->arch.reserved_gpa_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu);
@@ -208,6 +206,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
/* Invoke the vendor callback only after the above state is updated. */
static_call(kvm_x86_vcpu_after_set_cpuid)(vcpu);
+ kvm_update_pv_runtime(vcpu);
+
/*
* Except for the MMU, which needs to do its thing any vendor specific
* adjustments to the reserved GPA bits.
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b36ca4e476c2..b13bcfb2617c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4114,6 +4114,19 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
if (nested && guest_cpuid_has(vcpu, X86_FEATURE_SVM))
kvm_request_apicv_update(vcpu->kvm, false,
APICV_INHIBIT_REASON_NESTED);
+
+ if (!guest_cpuid_has(vcpu, X86_FEATURE_X2APIC) &&
+ !(nested && guest_cpuid_has(vcpu, X86_FEATURE_SVM))) {
+ /*
+ * PV_SEND_IPI feature masks out AVIC acceleration to IPI.
+ * So, we do not expose PV_SEND_IPI feature to guest when
+ * AVIC is enabled.
+ */
+ best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0);
+ if (best && enable_apicv &&
+ (best->eax & (1 << KVM_FEATURE_PV_SEND_IPI)))
+ best->eax &= ~(1 << KVM_FEATURE_PV_SEND_IPI);
+ }
}
init_vmcb_after_set_cpuid(vcpu);
}




--
zhenwei pi