Re: [PATCH v3 07/28] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled

From: Sean Christopherson
Date: Mon Sep 26 2022 - 13:38:50 EST


On Fri, Sep 23, 2022, Maxim Levitsky wrote:
> On Tue, 2022-09-20 at 23:31 +0000, Sean Christopherson wrote:
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 2c96c43c313a..6475c882b359 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1132,6 +1132,17 @@ enum kvm_apicv_inhibit {
> > * AVIC is disabled because SEV doesn't support it.
> > */
> > APICV_INHIBIT_REASON_SEV,
> > +
> > + /*
> > + * Due to sharing page tables across vCPUs, the xAPIC memslot must be
> > + * deleted if any vCPU has x2APIC enabled as SVM doesn't provide fully
> > + * independent controls for AVIC vs. x2AVIC, and also because SVM
> > + * supports a "hybrid" AVIC mode for CPUs that support AVIC but not
> > + * x2AVIC. Note, this isn't a "full" inhibit and is tracked separately.
> > + * AVIC can still be activated, but KVM must not create SPTEs for the
> > + * APIC base. For simplicity, this is sticky.
> > + */
> > + APICV_INHIBIT_REASON_X2APIC,
>
> Hi Sean!
>
> So assuming that I won't object to making it SVM specific (I still think
> that VMX should also inhibit this memslot because this is closer to x86 spec,
> but if you really want it this way, I won't fight over it):

Heh, I don't necessarily "want" it this way, it's more that I don't see a compelling
reason to change KVM's behavior and risk silently causing a performance regression.
If KVM didn't already have the "APIC base may have RAM semantics" quirk, and/or if
this were the initial APICv implementation and thus no possible users, then I would
probably also vote to give APICv the same treatment.

> I somewhat don't like this inhibit, because now it is used just to say
> 'I am AVIC'.
>
> What do you think if you just move the code that removes the memslot to SVM,
> to avic_set_virtual_apic_mode?

Suffers the same SRCU issue (see below) :-/

Given the SRCU problem, I'd prefer to keep the management of the memslot in common
code, even though I agree it's a bit silly. And KVM_REQ_UNBLOCK is a perfect fit
for dealing with the SRCU issue, i.e. handling this in AVIC code would require
another hook on top of spreading the memslot management across x86 and SVM code.

> > @@ -1169,10 +1180,11 @@ struct kvm_arch {
> > struct kvm_apic_map __rcu *apic_map;
> > atomic_t apic_map_dirty;
> >
> > - /* Protects apic_access_memslot_enabled and apicv_inhibit_reasons */
> > - struct rw_semaphore apicv_update_lock;
> > -
> > bool apic_access_memslot_enabled;
> > + bool apic_access_memslot_inhibited;
>
> So the apic_access_memslot_enabled currently tracks if the memslot is enabled.
> As I see later in the patch when you free the memslot, you set it to false,
> which means that if a vCPU is created after that (it can happen in theory),
> the memslot will be created again :(
>
> I say we need 'enabled', and 'allocated' booleans instead. Inhibit will set
> enabled to false, and then on next vcpu run, that will free the memslot.
>
> when enabled == false, the code needs to be changed to not allocate it again.

This should be handled already. apic_access_memslot_enabled is toggled from
true=>false if and only if apic_access_memslot_inhibited is set, and the "enabled"
flag is protected by slots_lock. Thus, newly created vCPUs are guaranteed to
either see apic_access_memslot_enabled==true or apic_access_memslot_inhibited==true.

int kvm_alloc_apic_access_page(struct kvm *kvm)
{
struct page *page;
void __user *hva;
int ret = 0;

mutex_lock(&kvm->slots_lock);
if (kvm->arch.apic_access_memslot_enabled ||
kvm->arch.apic_access_memslot_inhibited) <=== prevents reallocation
goto out;

out:
mutex_unlock(&kvm->slots_lock);
return ret;
}

That could be made more obvious by adding a WARN in kvm_free_apic_access_page(), i.e.

void kvm_free_apic_access_page(struct kvm *kvm)
{
WARN_ON_ONCE(!kvm->arch.apic_access_memslot_inhibited);

mutex_lock(&kvm->slots_lock);

if (kvm->arch.apic_access_memslot_enabled) {
__x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, 0, 0);
kvm->arch.apic_access_memslot_enabled = false;
}

mutex_unlock(&kvm->slots_lock);
}

> > +
> > + /* Protects apicv_inhibit_reasons */
> > + struct rw_semaphore apicv_update_lock;
> > unsigned long apicv_inhibit_reasons;
> >
> > gpa_t wall_clock;
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 99994d2470a2..70f00eda75b2 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -2394,9 +2394,26 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
> > }
> > }
> >
> > - if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE))
> > + if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE)) {
> > kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
> >
> > + /*
> > + * Mark the APIC memslot as inhibited if x2APIC is enabled and
> > + * the x2APIC inhibit is required. The actual deletion of the
> > + * memslot is handled by vcpu_run() as SRCU may or may not be
> > + * held at this time, i.e. updating memslots isn't safe. Don't
> > + * check apic_access_memslot_inhibited, this vCPU needs to
> > + * ensure the memslot is deleted before re-entering the guest,
> > + * i.e. needs to make the request even if the inhibit flag was
> > + * already set by a different vCPU.
> > + */
> > + if (vcpu->kvm->arch.apic_access_memslot_enabled &&
> > + static_call(kvm_x86_check_apicv_inhibit_reasons)(APICV_INHIBIT_REASON_X2APIC)) {
> > + vcpu->kvm->arch.apic_access_memslot_inhibited = true;
> > + kvm_make_request(KVM_REQ_UNBLOCK, vcpu);
>
> You are about to remove the KVM_REQ_UNBLOCK in other patch series.

No, KVM_REQ_UNHALT is being removed. KVM_REQ_UNBLOCK needs to stay, although it
has a rather weird name, e.g. KVM_REQ_WORK would probably be better.

> How about just raising KVM_REQ_APICV_UPDATE on current vCPU
> and having a special case in kvm_vcpu_update_apicv of
>
> if (apic_access_memslot_enabled == false && apic_access_memslot_allocaed == true) {
> drop srcu lock

This was my initial thought as well, but the issue is that SRCU may or may not be
held, and so the unlock+lock would need to be conditional. That's technically a
solvable problem, as it's possible to detect if SRCU is held, but I really don't
want to rely on kvm_vcpu.srcu_depth for anything other than proving that KVM doesn't
screw up SRCU.

> free the memslot
> take srcu lock
> }