Re: [PATCH v3 04/21] x86/virt/tdx: Add skeleton for detecting and initializing TDX on demand

From: Kai Huang
Date: Thu Apr 28 2022 - 21:00:01 EST


On Thu, 2022-04-28 at 17:26 -0700, Dave Hansen wrote:
> On 4/28/22 17:11, Kai Huang wrote:
> > This is true. So I think w/o taking the lock is also fine, as the TDX module
> > initialization is a state machine. If any cpu goes offline during logical-cpu
> > level initialization and TDH.SYS.LP.INIT isn't done on that cpu, then later the
> > TDH.SYS.CONFIG will fail. Similarly, if any cpu going offline causes
> > TDH.SYS.KEY.CONFIG is not done for any package, then TDH.SYS.TDMR.INIT will
> > fail.
>
> Right. The worst-case scenario is someone is mucking around with CPU
> hotplug during TDX initialization is that TDX initialization will fail.
>
> We *can* fix some of this at least and provide coherent error messages
> with a pattern like this:
>
> cpus_read_lock();
> // check that all MADT-enumerated CPUs are online
> tdx_init();
> cpus_read_unlock();
>
> That, of course, *does* prevent CPUs from going offline during
> tdx_init(). It also provides a nice place for an error message:
>
> pr_warn("You offlined a CPU then want to use TDX? Sod off.\n");

Yes this is better.

The problem is how to check MADT-enumerated CPUs are online?

I checked the code, and it seems we can use 'num_processors + disabled_cpus' as
MADT-enumerated CPUs? In fact, there should be no 'disabled_cpus' for TDX, so I
think:

if (disabled_cpus || num_processors != num_online_cpus()) {
pr_err("Initializing the TDX module requires all MADT-
enumerated CPUs being onine.");
return -EINVAL;
}

But I may have misunderstanding.

>
> > A problem (I realized it exists in current implementation too) is shutting down
> > the TDX module, which requires calling TDH.SYS.LP.SHUTDOWN on all BIOS-enabled
> > cpus. Kernel can do this SEAMCALL at most for all present cpus. However when
> > any cpu is offline, this SEAMCALL won't be called on it, and it seems we need to
> > add new CPU hotplug callback to call this SEAMCALL when the cpu is online again.
>
> Hold on a sec. If you call TDH.SYS.LP.SHUTDOWN on any CPU, then TDX
> stops working everywhere, right?  
>

Yes.

But tot shut down the TDX module, it's better to call LP.SHUTDOWN on all
logical cpus as suggested by spec.

> But, if someone offlines one CPU, we
> don't want TDX to stop working everywhere.

Right. I am talking about when initializing fails due to any reason (i.e. -
ENOMEM), currently we shutdown the TDX module. When shutting down the TDX
module, we want to call LP.SHUTDOWN on all logical cpus. If there's any CPU
being offline when we do the shutdown, then LP.SHUTDOWN won't be called for that
cpu.

But as you suggested above, if we have an early check whether all MADT-
enumerated CPUs are online and if not we return w/o shutting down the TDX
module, then if we shutdown the module the LP.SHUTDOWN will be called on all
cpus.