Re: [RESEND PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection

From: Peter Zijlstra
Date: Fri Nov 11 2022 - 14:34:13 EST


On Fri, Nov 11, 2022 at 06:06:12PM +0000, Li, Xin3 wrote:
> > On Fri, Nov 11, 2022 at 01:48:26PM +0100, Paolo Bonzini wrote:
> > > On 11/11/22 13:19, Peter Zijlstra wrote:
> > > > On Fri, Nov 11, 2022 at 01:04:27PM +0100, Paolo Bonzini wrote:
> > > > > On Intel you can optionally make it hold onto IRQs, but NMIs are
> > > > > always eaten by the VMEXIT and have to be reinjected manually.
> > > >
> > > > That 'optionally' thing worries me -- as in, KVM is currently
> > > > opting-out?
> > >
> > > Yes, because "If the “process posted interrupts” VM-execution control
> > > is 1, the “acknowledge interrupt on exit” VM-exit control is 1" (SDM
> > > 26.2.1.1, checks on VM-Execution Control Fields). Ipse dixit. Posted
> > > interrupts are available and used on all processors since I think Ivy Bridge.
> >
> > (imagine the non-coc compliant reaction here)
> >
> > So instead of fixing it, they made it worse :-(
> >
> > And now FRED is arguably making it worse again, and people wonder why I
> > hate virt...
>
> Maybe I take it wrong, but FRED doesn't make anything worse. Fred entry
> code will call external_interrupt() immediately for IRQs.

But what about NMIs, afaict this is all horribly broken for NMIs.

So the whole VMX thing latches the NMI (which stops NMI recursion),
right?

But then you drop out of noinstr code, which means any random exception
can happen (kprobes #BP, hw_breakpoint #DB, or even #PF due to random
nonsense like *SAN). This exception will do IRET and clear the NMI
latch, all before you get to run any of the NMI code.

Note how the normal NMI code is very careful to clear DR7 and both
kprobes and hw_breakpoint know not to accept noinstr code as targets.

You threw all that out the window.

Also, NMI is IST, and with FRED it will run on a different stack as
well, directly calling external_interrupt() doesn't honour that either.

> You really really don't like the context how VMX dispatches NMI/IRQs (which has
> been there for a long time), right?

I really really hate this with a passion. The fact that it's been this
way is no justification for keeping it. Crap is crap.

Intel should have taken an example of SVM in this regard, and not
doubled down and extended this NMI hole to regular IRQs. These are
exactly the kind of exception delivery trainwrecks FRED is supposed to
fix, except in this case it appears it doesn't :/