Re: [PATCHv2 04/29] x86/traps: Add #VE support for TDX guest

From: Kirill A. Shutemov
Date: Fri Feb 11 2022 - 20:44:25 EST


On Tue, Feb 01, 2022 at 10:02:41PM +0100, Thomas Gleixner wrote:
> > +/*
> > + * Virtualization Exceptions (#VE) are delivered to TDX guests due to
> > + * specific guest actions which may happen in either user space or the
> > + * kernel:
> > + *
> > + * * Specific instructions (WBINVD, for example)
> > + * * Specific MSR accesses
> > + * * Specific CPUID leaf accesses
> > + * * Access to unmapped pages (EPT violation)
> > + *
> > + * In the settings that Linux will run in, virtualization exceptions are
> > + * never generated on accesses to normal, TD-private memory that has been
> > + * accepted.
> > + *
> > + * Syscall entry code has a critical window where the kernel stack is not
> > + * yet set up. Any exception in this window leads to hard to debug issues
> > + * and can be exploited for privilege escalation. Exceptions in the NMI
> > + * entry code also cause issues. Returning from the exception handler with
> > + * IRET will re-enable NMIs and nested NMI will corrupt the NMI stack.
> > + *
> > + * For these reasons, the kernel avoids #VEs during the syscall gap and
> > + * the NMI entry code. Entry code paths do not access TD-shared memory,
> > + * MMIO regions, use #VE triggering MSRs, instructions, or CPUID leaves
> > + * that might generate #VE.
>
> How is that enforced or validated? What checks for a violation of that
> assumption?

Hm. I think we would have to rely on code audit for it.

Entry code has no #VE inducing things: no port I/O, CPUID, HLT,
MONITOR/MWAIT, WBINVD/INVD, HLT, VMCALL.

There's single MSR read for MSR_GS_BASE paranoid_entry(), but it doesn't
trigger #VE either.

Other possible source of #VE is shared memory. If somebody tricks kernel
to access shared memory from entry code we have a bigger problem to deal
with than #VE in syscall gap.

Or do you have something more strict than code audit in mind? I don't see
it.

--
Kirill A. Shutemov