RE: [PATCH] KVM: LAPIC: Per vCPU control over kvm_can_post_timer_interrupt

From: yaoaili [么爱利]
Date: Mon Nov 22 2021 - 23:28:53 EST


> On Tue, 23 Nov 2021 at 03:14, Sean Christopherson <seanjc@xxxxxxxxxx>
> wrote:
> >
> > On Mon, Nov 22, 2021, Aili Yao wrote:
> > > From: Aili Yao <yaoaili@xxxxxxxxxxxx>
> > >
> > > When we isolate some pyhiscal cores, We may not use them for kvm
> > > guests, We may use them for other purposes like DPDK, or we can make
> > > some kvm guests isolated and some not, the global judgement
> > > pi_inject_timer is not enough; We may make wrong decisions:
> > >
> > > In such a scenario, the guests without isolated cores will not be
> > > permitted to use vmx preemption timer, and tscdeadline fastpath also
> > > be disabled, both will lead to performance penalty.
> > >
> > > So check whether the vcpu->cpu is isolated, if not, don't post timer
> > > interrupt.
> > >
> > > Signed-off-by: Aili Yao <yaoaili@xxxxxxxxxxxx>
> > > ---
> > > arch/x86/kvm/lapic.c | 4 +++-
> > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index
> > > 759952dd1222..72dde5532101 100644
> > > --- a/arch/x86/kvm/lapic.c
> > > +++ b/arch/x86/kvm/lapic.c
> > > @@ -34,6 +34,7 @@
> > > #include <asm/delay.h>
> > > #include <linux/atomic.h>
> > > #include <linux/jump_label.h>
> > > +#include <linux/sched/isolation.h>
> > > #include "kvm_cache_regs.h"
> > > #include "irq.h"
> > > #include "ioapic.h"
> > > @@ -113,7 +114,8 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic
> > > *apic)
> > >
> > > static bool kvm_can_post_timer_interrupt(struct kvm_vcpu *vcpu) {
> > > - return pi_inject_timer && kvm_vcpu_apicv_active(vcpu);
> > > + return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> > > + !housekeeping_cpu(vcpu->cpu, HK_FLAG_TIMER);
> >
> > I don't think this is safe, vcpu->cpu will be -1 if the vCPU isn't scheduled in.

Yes, vcpu->cpu is -1 before vcpu create, but in my environments, it didn't
trigger this issue. I need to dig more, Thanks!
Maybe I need one valid check here.

> > This also doesn't play nice with the admin forcing pi_inject_timer=1.
> > Not saying there's a reasonable use case for doing that, but it's
> > supported today and this would break that behavior. It would also
> > lead to weird behavior if a vCPU were migrated on/off a housekeeping
> > vCPU. Again, probably not a reasonable use case, but I don't see anything
> that would outright prevent that behavior.

Yes, this is not one common operation, But I did do test some scenarios:
1. isolated cpu --> housekeeping cpu;
isolated guest timer is in housekeeping CPU, for migration, kvm_can_post_timer_interrupt
will return false, so the timer may be migrated to vcpu->cpu;
This seems works in my test;
2. isolated --> isolated
Isolated guest timer is in housekeeping cpu, for migration,kvm_can_post_timer_interrupt return
true, timer is not migrated
3. housekeeping CPU --> isolated CPU
non-isolated CPU timer is usually in vcpu->cpu, for migration to isolated, kvm_can_post_timer_interrupt
will be true, the timer remain on the same CPU;
This seems works in my test;
4. housekeeping CPU --> housekeeping CPU
timer migrated;
It seems this is not an affecting problem;

> >
> > The existing behavior also feels a bit unsafe as pi_inject_timer is
> > writable while KVM is running, though I supposed that's orthogonal to this
> discussion.
> >
> > Rather than check vcpu->cpu, is there an existing vCPU flag that can
> > be queried, e.g. KVM_HINTS_REALTIME?
>
> How about something like below:
>
> From 67f605120e212384cb3d5788ba8c83f15659503b Mon Sep 17 00:00:00
> 2001
> From: Wanpeng Li <wanpengli@xxxxxxxxxxx>
> Date: Tue, 23 Nov 2021 10:36:10 +0800
> Subject: [PATCH] KVM: LAPIC: To keep the vCPUs in non-root mode for timer-
> pi
>
> From: Wanpeng Li <wanpengli@xxxxxxxxxxx>
>
> As commit 0c5f81dad46 (KVM: LAPIC: Inject timer interrupt via posted
> interrupt) mentioned that the host admin should well tune the guest setup,
> so that vCPUs are placed on isolated pCPUs, and with several pCPUs surplus
> for
> *busy* housekeeping.
> It is better to disable mwait/hlt/pause vmexits to keep the vCPUs in non-root
> mode. However, we may isolate pCPUs for other purpose like DPDK or we
> can make some guests isolated and others not, Let's add the checking
> kvm_mwait_in_guest() to kvm_can_post_timer_interrupt() since we can't
> benefit from timer posted-interrupt w/o keeping the vCPUs in non-root
> mode.
>
> Reported-by: Aili Yao <yaoaili@xxxxxxxxxxxx>
> Signed-off-by: Wanpeng Li <wanpengli@xxxxxxxxxxx>
> ---
> arch/x86/kvm/lapic.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index
> 759952dd1222..8257566d44c7 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -113,14 +113,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic
> *apic)
>
> static bool kvm_can_post_timer_interrupt(struct kvm_vcpu *vcpu) {
> - return pi_inject_timer && kvm_vcpu_apicv_active(vcpu);
> + return pi_inject_timer && kvm_mwait_in_guest(vcpu->kvm) &&
> kvm_vcpu_apicv_active(vcpu);
> }
>
> bool kvm_can_use_hv_timer(struct kvm_vcpu *vcpu) {
> return kvm_x86_ops.set_hv_timer
> - && !(kvm_mwait_in_guest(vcpu->kvm) ||
> - kvm_can_post_timer_interrupt(vcpu));
> + && !kvm_mwait_in_guest(vcpu->kvm);
> }
> EXPORT_SYMBOL_GPL(kvm_can_use_hv_timer);

This method seems more quick and safe, but I have one question: Does this kvm_mwait_in_guest
can guarantee the CPU isolated, in some production environments and usually, MWAIT feature is disabled in host
and even guests with isolated CPUs. And also we can set guests kvm_mwait_in_guest true with CPUs just pinned, not isolated.

Thanks