Re: [PATCH v7 06/20] x86/virt/tdx: Shut down TDX module in case of error

From: Dave Hansen
Date: Wed Nov 23 2022 - 13:19:10 EST


On 11/23/22 09:37, Sean Christopherson wrote:
> On Wed, Nov 23, 2022, Dave Hansen wrote:
>> There's no way we can guarantee _that_. For one, the PAMT* allocations
>> can always fail. I guess we could ask sysadmins to fire up a guest to
>> "prime" things, but that seems a little silly. Maybe that would work as
>> the initial implementation that we merge, but I suspect our users will
>> demand more determinism, maybe a boot or module parameter.
> Oh, you mean all of TDX initialization? I thought "initialization" here mean just
> doing tdx_enable().

Yes, but the first call to tdx_enable() does TDH_SYS_INIT and all the
subsequent work to get the module going.

> Yeah, that's not going to be a viable option. Aside from lacking determinisim,
> it would be all too easy to end up on a system with fragmented memory that can't
> allocate the PAMTs post-boot.

For now, the post-boot runtime PAMT allocations are the one any only way
that TDX can be initialized. I pushed for it to be done this way.
Here's why:

Doing tdx_enable() is relatively slow and it eats up a non-zero amount
of physically contiguous RAM for metadata (~1/256th or ~0.4% of RAM).
Systems that support TDX but will never run TDX guests should not pay
that cost.

That means that we either make folks opt-in at boot-time or we try to
make a best effort at runtime to do the metadata allocations.

>From my perspective, the best-effort stuff is absolutely needed. Users
are going to forget the command-line opt in and there's no harm in
_trying_ the big allocations even if they fail.

Second, in reality, the "real" systems that can run TDX guests are
probably not going to sit around fragmenting memory for a month before
they run their first guest. They're going to run one shortly after they
boot when memory isn't fragmented and the best-effort allocation will
work really well.

Third, if anyone *REALLY* cared to make it reliable *and* wanted to sit
around fragmenting memory for a month, they could just start a TDX guest
and kill it to get TDX initialized. This isn't ideal. But, to me, it
beats defining some new, separate ABI (or boot/module option) to do it.

So, let's have those discussions. Long-term, what *is* the most
reliable way to get the TDX module loaded with 100% determinism? What
new ABI or interfaces are needed? Also, is that 100% determinism
required the moment this series is merged? Or, can we work up to it?

I think it can wait until this particular series is farther along.