Re: [PATCH v2 17/18] KVM: x86: flush TLB separately from MMU reset

From: Sean Christopherson
Date: Fri Feb 18 2022 - 18:57:32 EST


On Thu, Feb 17, 2022, Paolo Bonzini wrote:
> For both CR0 and CR4, disassociate the TLB flush logic from the
> MMU role logic. Instead of relying on kvm_mmu_reset_context() being
> a superset of various TLB flushes (which is not necessarily going to
> be the case in the future), always call it if the role changes
> but also set the various TLB flush requests according to what is
> in the manual.
>
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> ---

Code is good, a few nits on comments.

Reviewed-by: Sean Christopherson <seanjc@xxxxxxxxxx>

> @@ -1057,28 +1064,41 @@ EXPORT_SYMBOL_GPL(kvm_is_valid_cr4);
>
> void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4)
> {
> + if ((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS)
> + kvm_mmu_reset_context(vcpu);
> +
> /*
> - * If any role bit is changed, the MMU needs to be reset.
> - *
> - * If CR4.PCIDE is changed 1 -> 0, the guest TLB must be flushed.
> * If CR4.PCIDE is changed 0 -> 1, there is no need to flush the TLB
> * according to the SDM; however, stale prev_roots could be reused
> * incorrectly in the future after a MOV to CR3 with NOFLUSH=1, so we
> - * free them all. KVM_REQ_MMU_RELOAD is fit for the both cases; it
> - * is slow, but changing CR4.PCIDE is a rare case.
> - *
> - * If CR4.PGE is changed, the guest TLB must be flushed.
> - *
> - * Note: resetting MMU is a superset of KVM_REQ_MMU_RELOAD and
> - * KVM_REQ_MMU_RELOAD is a superset of KVM_REQ_TLB_FLUSH_GUEST, hence
> - * the usage of "else if".
> + * free them all. This is *not* a superset of KVM_REQ_TLB_FLUSH_GUEST
> + * or KVM_REQ_TLB_FLUSH_CURRENT, because the hardware TLB is not flushed,
> + * so fall through.
> */
> - if ((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS)
> - kvm_mmu_reset_context(vcpu);
> - else if ((cr4 ^ old_cr4) & X86_CR4_PCIDE)
> + if (!tdp_enabled &&
> + (cr4 & X86_CR4_PCIDE) && !(old_cr4 & X86_CR4_PCIDE))
> kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
> - else if ((cr4 ^ old_cr4) & X86_CR4_PGE)
> - kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> +
> + /*
> + * The TLB has to be flushed for all PCIDs on:
> + * - CR4.PCIDE changed from 1 to 0

Uber nit, grammatically this should use "a ... change", not "changed". And I
think it's worth calling out that the flush is architecturally required.
Something like this, though I don't like using "conditions" to describe the
cases (can't think of a bette word, obviously).

/*
* A TLB flush for all PCIDs is architecturally required if any of the
* following conditions is true:
* - CR4.PCIDE is changed from 1 to 0
* - CR4.PGE is toggled
*/

> + * - any change to CR4.PGE
> + *
> + * This is a superset of KVM_REQ_TLB_FLUSH_CURRENT.
> + */
> + if (((cr4 ^ old_cr4) & X86_CR4_PGE) ||
> + (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE)))
> + kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> +
> + /*
> + * The TLB has to be flushed for the current PCID on:
> + * - CR4.SMEP changed from 0 to 1
> + * - any change to CR4.PAE
> + */

Same nit plus "architecturally required" feedback fo rthis one.

> + else if (((cr4 ^ old_cr4) & X86_CR4_PAE) ||
> + ((cr4 & X86_CR4_SMEP) && !(old_cr4 & X86_CR4_SMEP)))
> + kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> +
> }
> EXPORT_SYMBOL_GPL(kvm_post_set_cr4);
>
> @@ -11323,15 +11343,17 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> static_call(kvm_x86_update_exception_bitmap)(vcpu);
>
> /*
> - * Reset the MMU context if paging was enabled prior to INIT (which is
> + * A TLB flush is needed if paging was enabled prior to INIT (which is

I appreciate the cleverness in changing only a single like, but I think both
pieces warrant a mention. How 'bout this, to squeak by with two lines?

/*
* Reset the MMU and flush the TLB if paging was enabled (INIT only, as
* CR0 is currently guaranteed to be '0' prior to RESET). Unlike the

> * implied if CR0.PG=1 as CR0 will be '0' prior to RESET). Unlike the
> * standard CR0/CR4/EFER modification paths, only CR0.PG needs to be
> * checked because it is unconditionally cleared on INIT and all other
> * paging related bits are ignored if paging is disabled, i.e. CR0.WP,
> * CR4, and EFER changes are all irrelevant if CR0.PG was '0'.
> */
> - if (old_cr0 & X86_CR0_PG)
> + if (old_cr0 & X86_CR0_PG) {
> + kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> kvm_mmu_reset_context(vcpu);
> + }
>
> /*
> * Intel's SDM states that all TLB entries are flushed on INIT. AMD's
> --
> 2.31.1