RE: [PATCH v13 24/35] x86/fred: Add a NMI entry stub for FRED

From: Li, Xin3
Date: Fri Dec 15 2023 - 13:37:57 EST


> So we have recently discovered an overlooked interaction with VT-x.
> Immediately before VMENTER and after VMEXIT, CR2 is live with the
> *guest* CR2. Regardless of if the guest uses FRED or not, this is guest
> state and SHOULD NOT be corrupted. Furthermore, host state MUST NOT leak
> into the guest.
>
> NMIs are blocked on VMEXIT if the cause was an NMI, but not for other
> reasons, so a NMI coming in during this window that then #PFs could
> corrupt the guest CR2.

I add a comment to vmx_vcpu_enter_exit() in
https://lore.kernel.org/kvm/20231108183003.5981-1-xin3.li@xxxxxxxxx/T/#m29616c02befc04305085b1cbac64df916364626a
for this.

>
> Intel is exploring ways to close this hole, but for schedule reasons, it
> will not be available at the same time as the first implementation of
> FRED. Therefore, as a workaround, it turns out that the FRED NMI stub
> *will*, unfortunately, have to save and restore CR2 after all when (at
> least) Intel KVM is in use.
>
> Note that this is airtight: it does add a performance penalty to the NMI
> path (two read CR2 in the common case of no #PF), but there is no gap
> during which a bad CR2 value could be introduced in the guest, no matter
> in which sequence the events happen.
>
> In theory the performance penalty could be further reduced by
> conditionalizing this on the NMI happening in the critical region in the
> KVM code, but it seems to be pretty far from necessary to me.

We should keep the following code in the FRED NMI handler, right?

{
...
this_cpu_write(nmi_cr2, read_cr2());
...
if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))
write_cr2(this_cpu_read(nmi_cr2));
...
}

> This obviously was an unfortunate oversight on our part, but the
> workaround is simple and doesn't affect any non-NMI paths.
>
> > +
> > + if (IS_ENABLED(CONFIG_SMP) &&
> arch_cpu_is_offline(smp_processor_id()))
> > + return;
> > +
>
> This is cut & paste from elsewhere in the NMI code, but I believe the
> IS_ENABLED() is unnecessary (not to mention ugly): smp_processor_id()
> should always return zero on UP, and arch_cpu_is_offline() reduces to
> !(cpu == 0), so this is a statically true condition on UP.

Ah, good point!