Re: [PATCH v4 2/2] KVM: nVMX: Fix preemption timer bit set in vmcs02 even if L1 doesn't enable it

From: Wanpeng Li
Date: Fri Jul 08 2016 - 09:58:55 EST


2016-07-08 18:18 GMT+08:00 Paolo Bonzini <pbonzini@xxxxxxxxxx>:
>
>
> On 08/07/2016 02:38, Wanpeng Li wrote:
>> 2016-07-07 22:11 GMT+08:00 Paolo Bonzini <pbonzini@xxxxxxxxxx>:
>>>
>>>
>>> On 07/07/2016 15:23, Wanpeng Li wrote:
>>>>>>>>
>>>>>>>> if (kvm_lapic_hv_timer_in_use(vcpu) &&
>>>>>>>> + (is_guest_mode(vcpu) ||
>>>>>>>> kvm_x86_ops->set_hv_timer(vcpu,
>>>>>>>> - kvm_get_lapic_tscdeadline_msr(vcpu)))
>>>>>>>> + kvm_get_lapic_tscdeadline_msr(vcpu))))
>>>>>>>> kvm_lapic_switch_to_sw_timer(vcpu);
>>>>>>>> if (check_tsc_unstable()) {
>>>>>>>> u64 offset = kvm_compute_tsc_offset(vcpu,
>>>>>>>>
>>>>>>
>>>>>> Thanks, this is good as a fallback. I'll try to fix it by getting the
>>>>>> pin-based execution controls right but if I fail this patch is okay.
>>>> I believe we still need this patch even if you implement "L1 TSC
>>>> deadline timer to trigger while L2 is running" eventually, the codes
>>>> you posted before:
>>>>
>>>> exec_control = vmcs12->pin_based_vm_exec_control;
>>>> +exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>>>> exec_control |= vmcs_config.pin_based_exec_ctrl;
>>>> - exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>>>> + if (vmx->hv_deadline_tsc == -1)
>>>> + exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>>>>
>>>> So there is still case the preemption timer bit of vmcs02 is not set,
>>>> however, the scenario I mentioned above in kvm_arch_vcpu_load() will
>>>> set it unnecessary.
>>>
>>> kvm_x86_ops->set_hv_timer _will_ set the preemption timer bit of vmcs02
>>> if vmcs02 is the loaded one.
>>>
>>> This can happen if L2 has access to L1's local APIC registers (i.e. L1
>>> passes the local APIC instead of emulating it, as is the case in a
>>> partitioning hypervisor). While L2 runs, it writes to the TSC deadline
>>> MSR of L1. This causes a call to kvm_x86_ops->set_hv_timer while the
>>> active VMCS is a vmcs02.
>>
>> Yes, in the scenario you pointed out the call to
>> kvm_x86_ops->set_hv_timer while the active VMCS is vmcs02 is correct,
>> however, in the scenario I mentioned in the patch description is not
>> correct even if enable "L1 TSC deadline timer to trigger while L2 is
>> running".
>
> It doesn't help that you have not explained how to reproduce the
> bug---this is what the cover letter and commit messages are for, too.

I believe you pointed out this before:

| The patch looks correct, but I'm not sure how you get a preemption
| timer vmexit while vmcs02 is active:
|
| exec_control = vmcs12->pin_based_vm_exec_control;
| exec_control |= vmcs_config.pin_based_exec_ctrl;
| exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;

We can't get preemption timer vmexit which vmcs02 is loaded since
preemtion timer bit in vmcs02 is not set, then how can we get the
incorrect preemption timer vmexit in nested guest which patch 1 fixed?
Because the scenario I mentioned in patch 2 set vmcs02.

w/o patch1 + w/o enable "L1 TSC deadline timer to trigger while L2 is
running" => we will get the bug calltrace mentioned in patch1 since
incorrect vmcs02 bit is set due to the bug mentioned in patch 2. So
apply patch2 can fix it.

However, after enable "L1 TSC deadline timer to trigger while L2 is
running", we should have patch 1 to correctly capture nested vmexit
for preemption timer.

My setup is L0 and L1 both are latest kvm queue branch w/ Yunhong's
preemption timer enable patches and my previous "fix missed
cancellation of TSC deadline timer" patches. I always enable
preemption timer in L0, but try to enable or disable preemption timer
in L1.

Regards,
Wanpeng Li