Re: [PATCH 1/4] KVM: X86: Fix tlb flush for tdp in kvm_invalidate_pcid()

From: Sean Christopherson
Date: Tue Oct 19 2021 - 11:26:57 EST


On Tue, Oct 19, 2021, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
>
> The KVM doesn't know whether any TLB for a specific pcid is cached in
> the CPU when tdp is enabled. So it is better to flush all the guest
> TLB when invalidating any single PCID context.
>
> The case is rare or even impossible since KVM doesn't intercept CR3
> write or INVPCID instructions when tdp is enabled. The fix is just
> for the sake of robustness in case emulation can reach here or the
> interception policy is changed.
>
> Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
> ---
> arch/x86/kvm/x86.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c59b63c56af9..06169ed08db0 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1073,6 +1073,16 @@ static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
> unsigned long roots_to_free = 0;
> int i;
>
> + /*
> + * It is very unlikely to reach here when tdp_enabled. But if it is
> + * the case, the kvm doesn't know whether any TLB for the @pcid is
> + * cached in the CPU. So just flush the guest instead.
> + */
> + if (unlikely(tdp_enabled)) {

This is reachable on VMX if EPT=1, unrestricted_guest=0, and CR0.PG=0. In that
case, KVM is running the guest with the KVM-defined identity mapped CR3 / page
tables and intercepts MOV CR3 so that the guest can't ovewrite the "real" CR3,
and so that the guest sees its last written CR3 on read.

This is also reachable from the emulator if the guest manipulates a vCPU code
stream so that KVM sees a MOV CR3 after a legitimate emulation trigger.

However, in both cases the KVM_REQ_TLB_FLUSH_GUEST is unnecessary. In the first
case, paging is disabled so there are no TLB entries from the guest's perspective.
In the second, the guest is malicious/broken and gets to keep the pieces.

That said, I agree a sanity check is worthwhile, though with a reworded comment
to call out the known scenarios and that the TDP page tables are not affected by
the invalidation. Maybe this?

/*
* MOV CR3 and INVPCID are usually not intercepted when using TDP, but
* this is reachable when running EPT=1 and unrestricted_guest=0, and
* also via the emulator. KVM's TDP page tables are not in the scope of
* the invalidation, but the guest's TLB entries need to be flushed as
* the CPU may have cached entries in its TLB for the target PCID.
*/

> + kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> + return;
> + }
> +
> /*
> * If neither the current CR3 nor any of the prev_roots use the given
> * PCID, then nothing needs to be done here because a resync will
> --
> 2.19.1.6.gb485710b
>