Re: [RFC PATCH 5/6] KVM: X86: Alloc pae_root shadow page

From: Lai Jiangshan
Date: Wed Jan 05 2022 - 21:01:46 EST




On 2022/1/6 00:45, Sean Christopherson wrote:
On Wed, Jan 05, 2022, Lai Jiangshan wrote:
On Wed, Jan 5, 2022 at 5:54 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:


default_pae_pdpte is needed because the cpu expect PAE pdptes are
present when VMenter.

That's incorrect. Neither Intel nor AMD require PDPTEs to be present. Not present
is perfectly ok, present with reserved bits is what's not allowed.

Intel SDM:
A VM entry that checks the validity of the PDPTEs uses the same checks that are
used when CR3 is loaded with MOV to CR3 when PAE paging is in use[7]. If MOV to CR3
would cause a general-protection exception due to the PDPTEs that would be loaded
(e.g., because a reserved bit is set), the VM entry fails.

7. This implies that (1) bits 11:9 in each PDPTE are ignored; and (2) if bit 0
(present) is clear in one of the PDPTEs, bits 63:1 of that PDPTE are ignored.

But in practice, the VM entry fails if the present bit is not set in the
PDPTE for the linear address being accessed (when EPT enabled at least). The
host kvm complains and dumps the vmcs state.

That doesn't make any sense. If EPT is enabled, KVM should never use a pae_root.
The vmcs.GUEST_PDPTRn fields are in play, but those shouldn't derive from KVM's
shadow page tables.

Oh, I wrote the negative what I want to say again when I try to emphasis
something after I wrote a sentence and modified it several times.

I wanted to mean "EPT not enabled" when vmx.

The VM entry fails when the guest is in very early stage when booting which
might be still in real mode.

VMEXIT: intr_info=00000000 errorcode=0000000 ilen=00000000
reason=80000021 qualification=0000000000000002

IDTVectoring: info=00000000 errorcode=00000000


And I doubt there is a VMX ucode bug at play, as KVM currently uses '0' in its
shadow page tables for not-present PDPTEs.

If you can post/provide the patches that lead to VM-Fail, I'd be happy to help
debug.

If you can try this patchset, you can just set the default_pae_pdpte to 0 to test
it.

If you can't try this patchset, the mmu->pae_root can be possible to be modified
to test it.

I guess the vmx fails to translate %rip when VMentry in this case.