Re: [PATCH v2] x86: Skip WBINVD instruction for VM guest

From: Thomas Gleixner
Date: Thu Dec 02 2021 - 18:48:50 EST


Kirill,

On Fri, Dec 03 2021 at 01:21, Kirill A. Shutemov wrote:
> On Thu, Nov 25, 2021 at 01:40:24AM +0100, Thomas Gleixner wrote:
>> Kuppuswamy,
>> Either that or you provide patches with arguments which are based on
>> proper analysis and not on 'appears to' observations.
>
> I think the right solution to the WBINVD would be to add a #VE handler
> that does nothing. We don't have a reasonable way to handle it from within
> the guest. We can call the VMM in hope that it would handle it, but VMM is
> untrusted and it can ignore the request.
>
> Dave suggested that we need to do code audit to make sure that there's no
> user inside TDX guest environment that relies on WBINVD to work correctly.
>
> Below is full call tree of WBINVD. It is substantially larger than I
> anticipated from initial grep.
>
> Conclusions:
>
> - Most of callers are in ACPI code on changing S-states. Ignoring cache
> flush for S-state change on virtual machine should be safe.
>
> - The only WBINVD I was able to trigger is on poweroff from ACPI code.
> Reboot also should trigger it, but for some reason I don't see it.
>
> - Few caller in CPU offline code. TDX does not allowed to offline CPU as
> we cannot bring it back -- we don't have SIPI. And even if offline
> works for vCPU it should be safe to ignore WBINVD there.
>
> - NVDIMMs are not supported inside TDX. If it will change we would need
> to deal with cache flushing for this case. Hopefully, we would be able
> to avoid WBINVD.
>
> - Cache QoS and MTRR use WBINVD. They are disabled in TDX, but it is
> controlled by VMM if the feature is advertised. We would need to
> filter CPUID/MSRs to make sure VMM would not mess with them.
>
> Is it good enough justification for do-nothing #VE WBINVD handler?

first of all thank you very much for this very profound analysis.

This is really what I was asking for and you probably went even a step
deeper than that. Very appreciated.

What we should do instead of doing a wholesale let's ignore WBINVD is to
have a separate function/macro:

ACPI_FLUSH_CPU_CACHE_PHYS()

and invoke that from the functions which are considered to be safe.

That would default to ACPI_FLUSH_CPU_CACHE() for other architecures
obviously.

Then you can rightfully do:

#define ACPI_FLUSH_CPU_CACHE_PHYS() \
if (!cpu_feature_enabled(XXX)) \
wbinvd(); \

where $XXX might be FEATURE_TDX_GUEST for paranoia sake and then
extended to X86_FEATURE_HYPERVISOR if everyone agrees.

Then you have the #VE handler which just acts on any other wbinvd
invocation via warn, panic, whatever, no?

Thanks,

tglx