Re: [PATCH] KVM: nVMX: Fix L2 guest hang if shadow page tables on EPT

From: Wanpeng Li
Date: Sat Mar 18 2017 - 02:38:06 EST


2017-03-18 1:28 GMT+08:00 Ladi Prosek <lprosek@xxxxxxxxxx>:
> On Fri, Mar 17, 2017 at 3:41 PM, Wanpeng Li <kernellwp@xxxxxxxxx> wrote:
>> From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
>>
>> The L2 guest hang if shadow page tables on EPT, the trace on L1 shows that
>> L2 kvm_exit reason EXCEPTION_NMI and page fault repeatedly:
>>
>> qemu-system-x86-2821 [003] d..2 45.848814: kvm_entry: vcpu 0
>> qemu-system-x86-2821 [003] ...1 45.848827: kvm_exit: reason EXCEPTION_NMI rip 0xe05b info fe05b 80000b0e
>> qemu-system-x86-2821 [003] ...1 45.848827: kvm_page_fault: address fe05b error_code 14
>>
>> Commit 7ca29de21362 (KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT)
>> prevents to load L2's PDPTRs according to dereferencing L2's CR3 since it is
>> uninitialized in real mode. Hyper-V L1 will emulate L2 real mode with PAE
>> paging and EPT enabled. However, there is a progress to switch from Legacy
>> mode's such-mode Protected mode to Long mode during system boot, the check
>> in nested_vmx_load_cr3() will prevent to load PDPTRs if it is still in
>> Protected mode w/ PAE paging and nested EPT/shadow page tables on EPT. Actually
>> the original commit should just intended to prevent to dereference L2's CR3
>> if the L1 hypervisor emulates L2's real mode through vm8086.
>>
>> This patch fixes it by allowing load PDPTRs if PAE paing, EPT enabled and
>> !vm86_active.
>>
>> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
>> Cc: Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx>
>> Cc: Ladi Prosek <lprosek@xxxxxxxxxx>
>> Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
>> ---
>> arch/x86/kvm/vmx.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index c664365..2b2a05f 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -9933,7 +9933,7 @@ static bool nested_cr3_valid(struct kvm_vcpu *vcpu, unsigned long val)
>> static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool nested_ept,
>> u32 *entry_failure_code)
>> {
>> - if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
>> + if (cr3 != kvm_read_cr3(vcpu) || pdptrs_changed(vcpu)) {
>> if (!nested_cr3_valid(vcpu, cr3)) {
>> *entry_failure_code = ENTRY_FAIL_DEFAULT;
>> return 1;
>> @@ -9944,7 +9944,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
>> * must not be dereferenced.
>> */
>> if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
>> - !nested_ept) {
>> + !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) {
>
> This change breaks Hyper-V on KVM. L2 hangs on start-up, same symptoms
> as before 7ca29de21362.

Hmm, I miss the function pdptrs_changed() will also dereference CR3.
How about something like this:

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c664365..d7ebf03 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9933,7 +9933,9 @@ static bool nested_cr3_valid(struct kvm_vcpu
*vcpu, unsigned long val)
static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long
cr3, bool nested_ept,
u32 *entry_failure_code)
{
- if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
+ if (cr3 != kvm_read_cr3(vcpu) ||
+ (!(nested_ept && to_vmx(vcpu)->rmode.vm86_active) &&
+ pdptrs_changed(vcpu))) {
if (!nested_cr3_valid(vcpu, cr3)) {
*entry_failure_code = ENTRY_FAIL_DEFAULT;
return 1;
@@ -9944,7 +9946,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu
*vcpu, unsigned long cr3, bool ne
* must not be dereferenced.
*/
if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
- !nested_ept) {
+ !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) {
if (!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) {
*entry_failure_code = ENTRY_FAIL_PDPTE;
return 1;

>
> I'll take a closer look next week. Is there an easy way for me to
> reproduce the issue you're seeing?

L1 KVM w/ EPT = 0 and L0 KVM w/ EPT = 1.

Regards,
Wanpeng Li

>
>> if (!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) {
>> *entry_failure_code = ENTRY_FAIL_PDPTE;
>> return 1;
>> --
>> 2.7.4
>>