Re: [PATCH v19 059/130] KVM: x86/tdp_mmu: Don't zap private pages for unsupported cases

From: Xiaoyao Li
Date: Wed Mar 27 2024 - 20:26:31 EST


On 3/26/2024 3:55 AM, Edgecombe, Rick P wrote:
On Mon, 2024-03-25 at 12:05 -0700, Isaku Yamahata wrote:
Right, the guest has to accept it on VE.  If the unmap was intentional by guest,
that's fine.  The unmap is unintentional (with vMTRR), the guest doesn't expect
VE with the GPA.


But, I guess we should punt to userspace is the guest tries to use
MTRRs, not that userspace can handle it happening in a TD...  But it
seems cleaner and safer then skipping zapping some pages inside the
zapping code.

I'm still not sure if I understand the intention and constraints fully.
So please correct. This (the skipping the zapping for some operations)
is a theoretical correctness issue right? It doesn't resolve a TD
crash?

For lapic, it's safe guard. Because TDX KVM disables APICv with
APICV_INHIBIT_REASON_TDX, apicv won't call kvm_zap_gfn_range().
Ah, I see it:
https://lore.kernel.org/lkml/38e2f8a77e89301534d82325946eb74db3e47815.1708933498.git.isaku.yamahata@xxxxxxxxx/

Then it seems a warning would be more appropriate if we are worried there might be a way to still
call it. If we are confident it can't, then we can just ignore this case.


For MTRR, the purpose is to make the guest boot (without the guest kernel
command line like clearcpuid=mtrr) .
If we can assume the guest won't touch MTRR registers somehow, KVM can return an
error to TDG.VP.VMCALL<RDMSR, WRMSR>(MTRR registers). So it doesn't call
kvm_zap_gfn_range(). Or we can use KVM_EXIT_X86_{RDMSR, WRMSR} as you suggested.

My understanding is that Sean prefers to exit to userspace when KVM can't handle something, versus
making up behavior that keeps known guests alive. So I would think we should change this patch to
only be about not using the zapping roots optimization. Then a separate patch should exit to
userspace on attempt to use MTRRs. And we ignore the APIC one.

Certainly no. If exit to userspace, what is the exit reason and what is expected for userspace to do? userspace can do nothing, except either kill the TD or eat the RDMSR/WRMSR.

There is nothing to do with userspace. MTRR is virtualized as fixed1 for TD (by current TDX architecture). Userspace can do nothing on it and it's not userspace's fault to let TD guest manipulate on MTRR MSRs.

This is the bad design of current TDX, what KVM should do is return error to TD on TDVMCALL of WR/RDMSR on MTRR MSRs. This should be a known flaw of TDX that MTRR is not supported though TD guest reads the MTRR CPUID as 1.

This flaw should be fixed by TDX architecture that making MTRR configurable. At that time, userspace is responsible to set MSR filter on MTRR MSRs if it wants to configure the MTRR CPUID to 1.

This is trying to guess what maintainers would want here. I'm less sure what Paolo prefers.