Re: [PATCH v7 09/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

From: Huang, Kai
Date: Wed Nov 23 2022 - 06:40:35 EST


On Tue, 2022-11-22 at 15:39 -0800, Dave Hansen wrote:
> On 11/20/22 16:26, Kai Huang wrote:
> > TDX provides increased levels of memory confidentiality and integrity.
> > This requires special hardware support for features like memory
> > encryption and storage of memory integrity checksums. Not all memory
> > satisfies these requirements.
> >
> > As a result, TDX introduced the concept of a "Convertible Memory Region"
> > (CMR). During boot, the firmware builds a list of all of the memory
> > ranges which can provide the TDX security guarantees. The list of these
> > ranges, along with TDX module information, is available to the kernel by
> > querying the TDX module via TDH.SYS.INFO SEAMCALL.
>
> I think the last sentence goes too far. What does it matter what the
> name of the SEAMCALL is? Who cares at this point? It's in the patch.
> Scroll down two pages if you really care.

I'll remove "via TDH.SYS.INFO SEAMCALL".

>
> > The host kernel can choose whether or not to use all convertible memory
> > regions as TDX-usable memory. Before the TDX module is ready to create
> > any TDX guests, the kernel needs to configure the TDX-usable memory
> > regions by passing an array of "TD Memory Regions" (TDMRs) to the TDX
> > module. Constructing the TDMR array requires information of both the
> > TDX module (TDSYSINFO_STRUCT) and the Convertible Memory Regions. Call
> > TDH.SYS.INFO to get this information as a preparation.
>
> That last sentece is kinda goofy. I think there's a way to distill this
> whole thing down more effecively.
>
> CMRs tell the kernel which memory is TDX compatible. The kernel
> takes CMRs and constructs "TD Memory Regions" (TDMRs). TDMRs
> let the kernel grante TDX protections to some or all of the CMR
> areas.

Will do.

But it seems we should still mention "Constructing TDMRs requires information of
both the TDX module (TDSYSINFO_STRUCT) and the CMRs"? The reason is to justify
"use static to avoid having to pass them as function arguments when constructing
TDMRs" below.

>
> > Use static variables for both TDSYSINFO_STRUCT and CMR array to avoid
>
> I find it very useful to be precise when referring to code. Your code
> says 'tdsysinfo_struct', yet this says 'TDSYSINFO_STRUCT'. Why the
> difference?

Here I actually didn't intend to refer to any code. In the above paragraph
(that is going to be replaced with yours), I mentioned "TDSYSINFO_STRUCT" to
explain what does "information of the TDX module" actually refer to, since
TDSYSINFO_STRUCT is used in the spec.

What's your preference?

>
> > having to pass them as function arguments when constructing the TDMR
> > array. And they are too big to be put to the stack anyway. Also, KVM
> > needs to use the TDSYSINFO_STRUCT to create TDX guests.
>
> This is also a great place to mention that the tdsysinfo_struct contains
> a *lot* of gunk which will not be used for a bit or that may never get
> used.

Perhaps below?

"Note many members in tdsysinfo_struct' are not used by the kernel".

Btw, may I ask why does it matter?

[...]


> > +
> > +/* Check CMRs reported by TDH.SYS.INFO, and trim tail empty CMRs. */
> > +static int trim_empty_cmrs(struct cmr_info *cmr_array, int *actual_cmr_num)
> > +{
> > + struct cmr_info *cmr;
> > + int i, cmr_num;
> > +
> > + /*
> > + * Intel TDX module spec, 20.7.3 CMR_INFO:
> > + *
> > + * TDH.SYS.INFO leaf function returns a MAX_CMRS (32) entry
> > + * array of CMR_INFO entries. The CMRs are sorted from the
> > + * lowest base address to the highest base address, and they
> > + * are non-overlapping.
> > + *
> > + * This implies that BIOS may generate invalid empty entries
> > + * if total CMRs are less than 32. Need to skip them manually.
> > + *
> > + * CMR also must be 4K aligned. TDX doesn't trust BIOS. TDX
> > + * actually verifies CMRs before it gets enabled, so anything
> > + * doesn't meet above means kernel bug (or TDX is broken).
> > + */
>
> I dislike comments like this that describe all the code below. Can't
> you simply put the comment near the code that implements it?

Will do.

>
> > + cmr = &cmr_array[0];
> > + /* There must be at least one valid CMR */
> > + if (WARN_ON_ONCE(is_cmr_empty(cmr) || !is_cmr_ok(cmr)))
> > + goto err;
> > +
> > + cmr_num = *actual_cmr_num;
> > + for (i = 1; i < cmr_num; i++) {
> > + struct cmr_info *cmr = &cmr_array[i];
> > + struct cmr_info *prev_cmr = NULL;
> > +
> > + /* Skip further empty CMRs */
> > + if (is_cmr_empty(cmr))
> > + break;
> > +
> > + /*
> > + * Do sanity check anyway to make sure CMRs:
> > + * - are 4K aligned
> > + * - don't overlap
> > + * - are in address ascending order.
> > + */
> > + if (WARN_ON_ONCE(!is_cmr_ok(cmr)))
> > + goto err;
>
> Why does cmr_array[0] get a pass on the empty and sanity checks?

TDX MCHECK verifies CMRs before enabling TDX, so there must be at least one
valid CMR.

And cmr_array[0] is checked before this loop.

>
> > + prev_cmr = &cmr_array[i - 1];
> > + if (WARN_ON_ONCE((prev_cmr->base + prev_cmr->size) >
> > + cmr->base))
> > + goto err;
> > + }
> > +
> > + /* Update the actual number of CMRs */
> > + *actual_cmr_num = i;
>
> That comment is not helpful. Yes, this is literally updating the number
> of CMRs. Literally. That's the "what". But, the "why" is important.
> Why is it doing this?

When building the list of "TDX-usable" memory regions, the kernel verifies those
regions against CMRs to see whether they are truly convertible memory.

How about adding a comment like below:

/*
* When the kernel builds the TDX-usable memory regions, it verifies
* they are truly convertible memory by checking them against CMRs.
* Update the actual number of CMRs to skip those empty CMRs.
*/

Also, I think printing CMRs in the dmesg is helpful. Printing empty (zero) CMRs
will put meaningless log to the dmesg.

>
> > + /* Print kernel checked CMRs */
> > + print_cmrs(cmr_array, *actual_cmr_num, "Kernel-checked-CMR");
>
> This is the point where I start to lose patience with these comments.
> These are just a waste of space.

Sorry will remove.

>
> Also, I saw the loop above check 'cmr_num' CMRs for is_cmr_ok(). Now,
> it'll print an 'actual_cmr_num=1' number of CMRs as being
> "kernel-checked". Why? That makes zero sense.

The loop quits when it sees an empty CMR. I think there's no need to check
further CMRs as they must be empty (TDX MCHECK verifies CMRs).

>
> > + return 0;
> > +err:
> > + pr_info("[TDX broken ?]: Invalid CMRs detected\n");
> > + print_cmrs(cmr_array, cmr_num, "BIOS-CMR");
> > + return -EINVAL;
> > +}
> > +
> > +static int tdx_get_sysinfo(void)
> > +{
> > + struct tdx_module_output out;
> > + int ret;
> > +
> > + BUILD_BUG_ON(sizeof(struct tdsysinfo_struct) != TDSYSINFO_STRUCT_SIZE);
> > +
> > + ret = seamcall(TDH_SYS_INFO, __pa(&tdx_sysinfo), TDSYSINFO_STRUCT_SIZE,
> > + __pa(tdx_cmr_array), MAX_CMRS, NULL, &out);
> > + if (ret)
> > + return ret;
> > +
> > + /* R9 contains the actual entries written the CMR array. */
> > + tdx_cmr_num = out.r9;
> > +
> > + pr_info("TDX module: atributes 0x%x, vendor_id 0x%x, major_version %u, minor_version %u, build_date %u, build_num %u",
> > + tdx_sysinfo.attributes, tdx_sysinfo.vendor_id,
> > + tdx_sysinfo.major_version, tdx_sysinfo.minor_version,
> > + tdx_sysinfo.build_date, tdx_sysinfo.build_num);
>
> This is a case where a little bit of vertical alignment will go a long way:
>
> > + tdx_sysinfo.attributes, tdx_sysinfo.vendor_id,
> > + tdx_sysinfo.major_version, tdx_sysinfo.minor_version,
> > + tdx_sysinfo.build_date, tdx_sysinfo.build_num);

Thanks will do.

>
> > +
> > + /*
> > + * trim_empty_cmrs() updates the actual number of CMRs by
> > + * dropping all tail empty CMRs.
> > + */
> > + return trim_empty_cmrs(tdx_cmr_array, &tdx_cmr_num);
> > +}
>
> Why does this both need to respect the "tdx_cmr_num = out.r9" value
> *and* trim the empty ones? Couldn't it just ignore the "tdx_cmr_num =
> out.r9" value and just trim the empty ones either way? It's not like
> there is a billion of them. It would simplify the code for sure.

OK. Since spec says MAX_CMRs is 32, so I can use 32 instead of reading out from
R9.

[...]

> > +struct cpuid_config {
> > + u32 leaf;
> > + u32 sub_leaf;
> > + u32 eax;
> > + u32 ebx;
> > + u32 ecx;
> > + u32 edx;
> > +} __packed;
> > +
> > +#define TDSYSINFO_STRUCT_SIZE 1024
> > +#define TDSYSINFO_STRUCT_ALIGNMENT 1024
> > +
> > +struct tdsysinfo_struct {
> > + /* TDX-SEAM Module Info */
> > + u32 attributes;
> > + u32 vendor_id;
> > + u32 build_date;
> > + u16 build_num;
> > + u16 minor_version;
> > + u16 major_version;
> > + u8 reserved0[14];
> > + /* Memory Info */
> > + u16 max_tdmrs;
> > + u16 max_reserved_per_tdmr;
> > + u16 pamt_entry_size;
> > + u8 reserved1[10];
> > + /* Control Struct Info */
> > + u16 tdcs_base_size;
> > + u8 reserved2[2];
> > + u16 tdvps_base_size;
> > + u8 tdvps_xfam_dependent_size;
> > + u8 reserved3[9];
> > + /* TD Capabilities */
> > + u64 attributes_fixed0;
> > + u64 attributes_fixed1;
> > + u64 xfam_fixed0;
> > + u64 xfam_fixed1;
> > + u8 reserved4[32];
> > + u32 num_cpuid_config;
> > + /*
> > + * The actual number of CPUID_CONFIG depends on above
> > + * 'num_cpuid_config'. The size of 'struct tdsysinfo_struct'
> > + * is 1024B defined by TDX architecture. Use a union with
> > + * specific padding to make 'sizeof(struct tdsysinfo_struct)'
> > + * equal to 1024.
> > + */
> > + union {
> > + struct cpuid_config cpuid_configs[0];
> > + u8 reserved5[892];
> > + };
>
> Can you double check what the "right" way to do variable arrays is these
> days? I thought the [0] method was discouraged.
>
> Also, it isn't *really* 892 bytes of reserved space, right? Anything
> that's not cpuid_configs[] is reserved, I presume. Could you try to be
> more precise there?

I'll do some study first here and get back to you. Thanks.

The intention is to make sure the structure size is 1024B, so that the static
variable will have enough space for the TDX module to write.