Re: [RFC v2-fix-v4 1/1] x86/tdx: Skip WBINVD instruction for TDX guest

From: Dan Williams
Date: Wed Jun 09 2021 - 00:23:05 EST


On Tue, Jun 8, 2021 at 9:02 PM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
> On 6/8/21 8:40 PM, Dan Williams wrote:
> > On Tue, Jun 8, 2021 at 6:10 PM Kuppuswamy Sathyanarayanan
> > <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx> wrote:
> >>
> >> Current TDX spec does not have support to emulate the WBINVD
> >> instruction. If any feature that uses WBINVD is enabled/used
> >> in TDX guest, it will lead to un-handled #VE exception, which
> >> will be handled as #GP fault.
> >>
> >> ACPI drivers also uses WBINVD instruction for cache flushes in
> >> reboot or shutdown code path. Since TDX guest has requirement
> >> to support shutdown feature, skip WBINVD instruction usage
> >> in ACPI drivers for TDX guest.
> >
> > This sounds awkward...
> >
> >> Since cache is always coherent in TDX guests, making wbinvd as
> >
> > This is incorrect, ACPI cache flushing is not about I/O or CPU coherency...
> >
> >> noop should not cause any issues in above mentioned code path.
> >
> > ..."should" is a famous last word...
> >
> >> The end-behavior is the same as KVM guest (treat as noops).
> >
> > ..."KVM gets away with it" is not a justification that TDX can stand
> > on otherwise we would not be here fixing up ACPICA properly.
> >
> > How about:
> >
> > "TDX guests use standard ACPI mechanisms to signal sleep state entry
> > (including reboot) to the host. The ACPI specification mandates WBINVD
> > on any sleep state entry with the expectation that the platform is
> > only responsible for maintaining the state of memory over sleep
> > states, not preserving dirty data in any CPU caches. ACPI cache
> > flushing requirements pre-date the advent of virtualization. Given TDX
> > guest sleep state entry does not affect any host power rails it is not
> > required to flush caches. The host is responsible for maintaining
> > cache state over its own bare metal sleep state transitions that
> > power-off the cache. If the host fails to manage caches over its sleep
> > state transitions the guest..."
> >
>
> I like this description, but shouldn't the logic be:
>
> if (!CPUID has hypervisor bit set)
> wbinvd();
>
> As far as I know, most hypervisors will turn WBINVD into a noop and,
> even if they don't, it seems to be that something must be really quite
> wrong for a guest to need to WBINVD for ACPI purposes.

Agree, a well behaved guest should not pretend its callouts to the
virtual ACPI BIOS actually affect a host power rail.