Re: [PATCH] KVM: nVMX: Fix L2 guest hang if shadow page tables on EPT

From: Paolo Bonzini
Date: Wed Mar 22 2017 - 10:17:35 EST




On 22/03/2017 14:44, Wanpeng Li wrote:
> 2017-03-22 20:00 GMT+08:00 Ladi Prosek <lprosek@xxxxxxxxxx>:
>> On Sat, Mar 18, 2017 at 7:37 AM, Wanpeng Li <kernellwp@xxxxxxxxx> wrote:
>>> 2017-03-18 1:28 GMT+08:00 Ladi Prosek <lprosek@xxxxxxxxxx>:
>>>> On Fri, Mar 17, 2017 at 3:41 PM, Wanpeng Li <kernellwp@xxxxxxxxx> wrote:
>>>>> From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
>>>>>
>>>>> The L2 guest hang if shadow page tables on EPT, the trace on L1 shows that
>>>>> L2 kvm_exit reason EXCEPTION_NMI and page fault repeatedly:
>>>>>
>>>>> qemu-system-x86-2821 [003] d..2 45.848814: kvm_entry: vcpu 0
>>>>> qemu-system-x86-2821 [003] ...1 45.848827: kvm_exit: reason EXCEPTION_NMI rip 0xe05b info fe05b 80000b0e
>>>>> qemu-system-x86-2821 [003] ...1 45.848827: kvm_page_fault: address fe05b error_code 14
>>>>>
>>>>> Commit 7ca29de21362 (KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT)
>>>>> prevents to load L2's PDPTRs according to dereferencing L2's CR3 since it is
>>>>> uninitialized in real mode. Hyper-V L1 will emulate L2 real mode with PAE
>>>>> paging and EPT enabled. However, there is a progress to switch from Legacy
>>>>> mode's such-mode Protected mode to Long mode during system boot, the check
>>>>> in nested_vmx_load_cr3() will prevent to load PDPTRs if it is still in
>>>>> Protected mode w/ PAE paging and nested EPT/shadow page tables on EPT. Actually
>>>>> the original commit should just intended to prevent to dereference L2's CR3
>>>>> if the L1 hypervisor emulates L2's real mode through vm8086.
>>>>>
>>>>> This patch fixes it by allowing load PDPTRs if PAE paing, EPT enabled and
>>>>> !vm86_active.
>>>>>
>>>>> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
>>>>> Cc: Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx>
>>>>> Cc: Ladi Prosek <lprosek@xxxxxxxxxx>
>>>>> Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
>>>>> ---
>>>>> arch/x86/kvm/vmx.c | 4 ++--
>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>> index c664365..2b2a05f 100644
>>>>> --- a/arch/x86/kvm/vmx.c
>>>>> +++ b/arch/x86/kvm/vmx.c
>>>>> @@ -9933,7 +9933,7 @@ static bool nested_cr3_valid(struct kvm_vcpu *vcpu, unsigned long val)
>>>>> static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool nested_ept,
>>>>> u32 *entry_failure_code)
>>>>> {
>>>>> - if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
>>>>> + if (cr3 != kvm_read_cr3(vcpu) || pdptrs_changed(vcpu)) {
>>>>> if (!nested_cr3_valid(vcpu, cr3)) {
>>>>> *entry_failure_code = ENTRY_FAIL_DEFAULT;
>>>>> return 1;
>>>>> @@ -9944,7 +9944,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
>>>>> * must not be dereferenced.
>>>>> */
>>>>> if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
>>>>> - !nested_ept) {
>>>>> + !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) {
>>>>
>>>> This change breaks Hyper-V on KVM. L2 hangs on start-up, same symptoms
>>>> as before 7ca29de21362.
>>>
>>> Hmm, I miss the function pdptrs_changed() will also dereference CR3.
>>> How about something like this:
>>>
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index c664365..d7ebf03 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -9933,7 +9933,9 @@ static bool nested_cr3_valid(struct kvm_vcpu
>>> *vcpu, unsigned long val)
>>> static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long
>>> cr3, bool nested_ept,
>>> u32 *entry_failure_code)
>>> {
>>> - if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
>>> + if (cr3 != kvm_read_cr3(vcpu) ||
>>> + (!(nested_ept && to_vmx(vcpu)->rmode.vm86_active) &&
>>> + pdptrs_changed(vcpu))) {
>>> if (!nested_cr3_valid(vcpu, cr3)) {
>>> *entry_failure_code = ENTRY_FAIL_DEFAULT;
>>> return 1;
>>> @@ -9944,7 +9946,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu
>>> *vcpu, unsigned long cr3, bool ne
>>> * must not be dereferenced.
>>> */
>>> if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
>>> - !nested_ept) {
>>> + !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) {
>>> if (!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) {
>>> *entry_failure_code = ENTRY_FAIL_PDPTE;
>>> return 1;
>>
>> Still the same, Hyper-V is broken. The problem is not in real vs.
>> protected mode. The way nested_ept_enabled is computed is incorrect.
>>
>> I can run both Hyper-V and KVM with EPT = 0 in L1 with this patch. Can
>> you please give it a try?
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 98e82ee..9145c94 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -10121,7 +10121,7 @@ static int prepare_vmcs02(struct kvm_vcpu
>> *vcpu, struct vmcs12 *vmcs12,
>> vmcs12->guest_intr_status);
>> }
>>
>> - nested_ept_enabled = (exec_control &
>> SECONDARY_EXEC_ENABLE_EPT) != 0;
>> + nested_ept_enabled =
>> (vmcs12->secondary_vm_exec_control & SECONDARY_EXEC_ENABLE_EPT) != 0;
>>
>> /*
>> * Write an illegal value to APIC_ACCESS_ADDR. Later,
>
> You are right, it works. Please send out a formal patch and add the
> kvm-unit-tests as Paolo mentioned.

I do believe it would be more fair if _you_ send out the kvm-unit-tests
patch.

Paolo