Re: [PATCH v12 07/22] x86/virt/tdx: Add skeleton to enable TDX on demand

From: Peter Zijlstra
Date: Mon Jul 03 2023 - 11:03:59 EST


On Mon, Jul 03, 2023 at 07:40:55AM -0700, Dave Hansen wrote:
> On 7/3/23 03:49, Peter Zijlstra wrote:
> >> There are also latency and noisy neighbor concerns, e.g. we *really* don't want
> >> to end up in a situation where creating a TDX guest for a customer can observe
> >> arbitrary latency *and* potentially be disruptive to VMs already running on the
> >> host.
> > Well, that's a quality of implementation issue with the whole TDX
> > crapola. Sounds like we want to impose latency constraints on the
> > various TDX calls. Allowing it to consume arbitrary amounts of CPU time
> > is unacceptable in any case.
>
> For what it's worth, everybody knew that calling into the TDX module was
> going to be a black hole and that consuming large amounts of CPU at
> random times would drive people bat guano crazy.
>
> The TDX Module ABI spec does have "Leaf Function Latency" warnings for
> some of the module calls. But, it's basically a binary thing. A call
> is either normal or "longer than most".
>
> The majority of the "longer than most" cases are for initialization.
> The _most_ obscene runtime ones are chunked up and can return partial
> progress to limit latency spikes. But I don't think folks tried as hard
> on the initialization calls since they're only called once which
> actually seems pretty reasonable to me.
>
> Maybe we need three classes of "Leaf Function Latency":
> 1. Sane
> 2. "Longer than most"
> 3. Better turn the NMI watchdog off before calling this. :)
>
> Would that help?

I'm thikning we want something along the lines of the Xen preemptible
hypercalls, except less crazy. Where the caller does:

for (;;) {
ret = tdcall(fn, args);
if (ret == -EAGAIN) {
cond_resched();
continue;
}
break;
}

And then the TDX black box provides a guarantee that any one tdcall (or
seamcall or whatever) never takes more than X ns (possibly even
configurable) and we get to raise a bug report if we can prove it
actually takes longer.

Handing the CPU off to random code for random period of time is just not
a good idea, ever.