Re: [PATCH v2] KVM: VMX: Enable Notify VM exit

From: Xiaoyao Li
Date: Mon Jun 07 2021 - 05:23:47 EST


On 6/3/2021 9:52 PM, Vitaly Kuznetsov wrote:
Xiaoyao Li <xiaoyao.li@xxxxxxxxx> writes:

On 6/2/2021 6:31 PM, Vitaly Kuznetsov wrote:
Tao Xu <tao3.xu@xxxxxxxxx> writes:

There are some cases that malicious virtual machines can cause CPU stuck
(event windows don't open up), e.g., infinite loop in microcode when
nested #AC (CVE-2015-5307). No event window obviously means no events,
e.g. NMIs, SMIs, and IRQs will all be blocked, may cause the related
hardware CPU can't be used by host or other VM.

To resolve those cases, it can enable a notify VM exit if no event
window occur in VMX non-root mode for a specified amount of time
(notify window). Since CPU is first observed the risk of not causing
forward progress, after notify window time in a units of crystal clock,
Notify VM exit will happen. Notify VM exit can happen incident to delivery
of a vectored event.

Expose a module param for configuring notify window, which is in unit of
crystal clock cycle.
- A negative value (e.g. -1) is to disable this feature.
- Make the default as 0. It is safe because an internal threshold is added
to notify window to ensure all the normal instructions being coverd.
- User can set it to a large value when they want to give more cycles to
wait for some reasons, e.g., silicon wrongly kill some normal instruction
due to internal threshold is too small.

Notify VM exit is defined in latest Intel Architecture Instruction Set
Extensions Programming Reference, chapter 9.2.

Co-developed-by: Xiaoyao Li <xiaoyao.li@xxxxxxxxx>
Signed-off-by: Xiaoyao Li <xiaoyao.li@xxxxxxxxx>
Signed-off-by: Tao Xu <tao3.xu@xxxxxxxxx>
---

Changelog:
v2:
Default set notify window to 0, less than 0 to disable.
Add more description in commit message.

Sorry if this was already discussed, but in case of nested
virtualization and when L1 also enables
SECONDARY_EXEC_NOTIFY_VM_EXITING, shouldn't we just reflect NOTIFY exits
during L2 execution to L1 instead of crashing the whole L1?


yes. If we expose it to nested, it should reflect the Notify VM exit to
L1 when L1 enables it.

But regarding nested, there are more things need to be discussed. e.g.,
1) It has dependence between L0 and L1, for security consideration. When
L0 enables it, it shouldn't be turned off during L2 VM is running.
a. Don't expose to L1 but enable for L1 when L2 VM is running.
b. expose it to L1 and force it enabled.

Could you please elaborate on the 'security' concern?

I mean the case that if we expose this feature to L1 VMM, L1 VMM cannot en/dis-able this feature on its own purpose when L0 turns it on.

i.e., vmcs02.settings has to be (L0's | L1's)

otherwise L1 guest can escape by creating an nested guest and disabling it.

My understanding
that during L2 execution:
If L0 enables the feature and L1 doesn't, vmexit goes to L0.
If L1 enables the feature and L0 doesn't, vmexit goes to L1.

If both L0 and L1 enable the feature, vmexit can probably (I didn't put
enough though in it I'm afraid) go to the one which has smaller window.

It sounds reasonable.


2) When expose it to L1, vmcs02.notify_window needs to be
min(L0.notify_window, L1.nofity_window)

We don't deal with nested to make this Patch simple.

Sure, I just wanted to check with you what's the future plan and if the
behavior you introduce is desireable in nested case.