Re: [PATCH v12 07/22] x86/virt/tdx: Add skeleton to enable TDX on demand

From: Dave Hansen
Date: Mon Jul 03 2023 - 10:41:03 EST


On 7/3/23 03:49, Peter Zijlstra wrote:
>> There are also latency and noisy neighbor concerns, e.g. we *really* don't want
>> to end up in a situation where creating a TDX guest for a customer can observe
>> arbitrary latency *and* potentially be disruptive to VMs already running on the
>> host.
> Well, that's a quality of implementation issue with the whole TDX
> crapola. Sounds like we want to impose latency constraints on the
> various TDX calls. Allowing it to consume arbitrary amounts of CPU time
> is unacceptable in any case.

For what it's worth, everybody knew that calling into the TDX module was
going to be a black hole and that consuming large amounts of CPU at
random times would drive people bat guano crazy.

The TDX Module ABI spec does have "Leaf Function Latency" warnings for
some of the module calls. But, it's basically a binary thing. A call
is either normal or "longer than most".

The majority of the "longer than most" cases are for initialization.
The _most_ obscene runtime ones are chunked up and can return partial
progress to limit latency spikes. But I don't think folks tried as hard
on the initialization calls since they're only called once which
actually seems pretty reasonable to me.

Maybe we need three classes of "Leaf Function Latency":
1. Sane
2. "Longer than most"
3. Better turn the NMI watchdog off before calling this. :)

Would that help?