Re: Question about AMD SVM's virtual NMI support in Linux kernel mainline

From: Santosh Shukla
Date: Fri Aug 25 2023 - 05:07:43 EST


Hi Sean,


On 8/24/2023 7:52 PM, Sean Christopherson wrote:
> +kvm and lkml, didn't realize they weren't Cc'd in the original mail.
>
> On Thu, Aug 24, 2023, Santosh Shukla wrote:
>> Hi Sean,
>>
>> On 8/23/2023 8:33 PM, Sean Christopherson wrote:
>>> On Fri, Aug 18, 2023, 陈培鸿(乘鸿) wrote:
>>>> According to the results, I found that in the case of concurrent NMIs, some
>>>> NMIs are still injected through eventinj instead of vNMI.
>>>
>>> Key word is "some", having two NMIs arrive "simultaneously" is uncommon. In quotes
>>> because if a vCPU is scheduled out or otherwise delayed, two NMIs can be recognized
>>> by KVM at the same time even if there was a sizeable delay between when they were
>>> sent.
>>>
>>>> Based on the above explanations, I summarize my questions as follows:
>>>> 1. According to the specification of AMD SVM vNMI, with vNMI enabled, will
>>>> some NMIs be injected through eventinj under the condition of concurrent
>>>> NMIs?
>>>
>>> Yes.
>>>
>>>> 2. If yes, what is the role of vNMI? Is it just an addition to eventinj? What
>>>> benefits is it designed to expect? Is there any benchmark data support?
>>>
>>> Manually injecting NMIs isn't problematic from a performance perspective. KVM
>>> takes control of the vCPU, i.e. forces a VM-Exit, to pend a virtual NMI, so there's
>>> no extra VM-Exit.
>>>
>>> The value added by vNMI support is that KVM doesn't need to manually track/detect
>>> when NMIs become unblocked in the guest. SVM doesn't provide a hardware-supported
>>> NMI-window exiting, so KVM has to intercept _and_ single-step IRET, which adds two
>>> VM-Exits for _every_ NMI when vNMI isn't available (and it's a complete mess for
>>> things like SEV-ES).
>>>
>>>> 3. If not, does it mean that the current SVM's vNMI support scheme in the
>>>> Linux mainline code is flawed? How should it be fixed?
>>>
>>> The approach as a whole isn't flawed, it's the best KVM can do given SVM's vNMI
>>> architecture and KVM's ABI with respect to "concurrent" NMIs.
>>>
>>> Hrm, though there does appear to be a bug in the injecting path. KVM doesn't
>>> manually set V_NMI_BLOCKING_MASK, and will unnecessarily enable IRET interception
>>> when manually injecting a vNMI. Intercepting IRET should be unnecessary because
>>> hardware will automatically accept the pending vNMI when NMIs become unblocked.
>>> And I don't see anything in the APM that suggests hardware will set V_NMI_BLOCKING_MASK
>>> when software directly injects an NMI.
>>>
>>> So I think we need:
>>>
>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>>> index d381ad424554..c956a9f500a2 100644
>>> --- a/arch/x86/kvm/svm/svm.c
>>> +++ b/arch/x86/kvm/svm/svm.c
>>> @@ -3476,6 +3476,11 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu)
>>> if (svm->nmi_l1_to_l2)
>>> return;
>>>
>>> + if (is_vnmi_enabled(svm)) {
>>> + svm->vmcb->control.int_ctl |= V_NMI_BLOCKING_MASK;
>>> + return;
>>> + }
>>> +
>>> svm->nmi_masked = true;
>>> svm_set_iret_intercept(svm);
>>> ++vcpu->stat.nmi_injections;
>>> --
>>>
>>> or if hardware does set V_NMI_BLOCKING_MASK in this case, just:
>>>
>>
>> Yes, HW does set BLOCKING_MASK when HW takes the pending vNMI event.
>
> I'm not asking about the pending vNMI case, which is clearly spelled out in the
> APM. I'm asking about directly injecting an NMI via:
>
> svm->vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI;
>

Yes. This is documented in APM as well.
https://www.amd.com/system/files/TechDocs/24593.pdf : "15.21.10 NMI Virtualization"

"
If Event Injection is used to inject an NMI when NMI Virtualization is enabled,
VMRUN sets V_NMI_MASK in the guest state.
"

>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>>> index d381ad424554..201a1a33ecd2 100644
>>> --- a/arch/x86/kvm/svm/svm.c
>>> +++ b/arch/x86/kvm/svm/svm.c
>>> @@ -3473,7 +3473,7 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu)
>>>
>>> svm->vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI;
>>>
>>> - if (svm->nmi_l1_to_l2)
>>> + if (svm->nmi_l1_to_l2 || is_vnmi_enabled(svm))
>>> return;
>>>
>>> svm->nmi_masked = true;
>>> --
>>>
>>
>> Above proposal make sense to me, I was reviewing source code flow
>> for a scenarios when two consecutive need to me delivered to Guest.
>> Example, process_nmi will pend the first NMI and then second NMI will
>> be injected through EVTINJ, as because (kvm_x86_inject_nmi)
>> will get called and that will set the _iret_intercept.
>>
>> With your proposal inject_nmi will be set the env_inj NMI w/o the IRET,
>> I think we could check for "is_vnmi_enabled" before the programming
>> the "evt_inj"?
>
> No, because the whole point of this path is to directly inject an NMI when NMIs
> are NOT blocked in the guest AND there is already a pending vNMI.

I agree, Your patch looks fine.

I'll do the testing on above patch and share the test feedback.

Thanks,
Santosh