Re: [PATCH v7 10/20] x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory

From: Huang, Ying
Date: Wed Nov 23 2022 - 19:48:59 EST


"Huang, Kai" <kai.huang@xxxxxxxxx> writes:

>> > > > +/*
>> > > > + * Add all memblock memory regions to the @tdx_memlist as TDX memory.
>> > > > + * Must be called when get_online_mems() is called by the caller.
>> > > > + */
>> > > > +static int build_tdx_memory(void)
>> > > > +{
>> > > > + unsigned long start_pfn, end_pfn;
>> > > > + int i, nid, ret;
>> > > > +
>> > > > + for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
>> > > > + /*
>> > > > + * The first 1MB may not be reported as TDX convertible
>> > > > + * memory. Manually exclude them as TDX memory.
>> > > > + *
>> > > > + * This is fine as the first 1MB is already reserved in
>> > > > + * reserve_real_mode() and won't end up to ZONE_DMA as
>> > > > + * free page anyway.
>> > > > + */
>> > > > + start_pfn = max(start_pfn, (unsigned long)SZ_1M >> PAGE_SHIFT);
>> > > > + if (start_pfn >= end_pfn)
>> > > > + continue;
>> > >
>> > > How about check whether first 1MB is reserved instead of depending on
>> > > the corresponding code isn't changed? Via for_each_reserved_mem_range()?
>> >
>> > IIUC, some reserved memory can be freed to page allocator directly, i.e. kernel
>> > init code/data. I feel it's not safe to just treat reserved memory will never
>> > be in page allocator. Otherwise we have for_each_free_mem_range() can use.
>>
>> Yes. memblock reverse information isn't perfect. But I still think
>> that it is still better than just assumption to check whether the frist
>> 1MB is reserved in memblock. Or, we can check whether the pages of the
>> first 1MB is reversed via checking struct page directly?
>>
>
> Sorry I am a little bit confused what you want to achieve here. Do you want to
> make some sanity check to make sure the first 1MB is indeed not in the page
> allocator?
>
> IIUC, it is indeed true. Please see the comment of calling reserve_real_mode()
> in setup_arch(). Also please see efi_free_boot_services(), which doesn't free
> the boot service if it is below 1MB.
>
> Also, my understanding is kernel's intention is to always reserve the first 1MB:
>
> /*
> * Don't free memory under 1M for two reasons:
> * - BIOS might clobber it
> * - Crash kernel needs it to be reserved
> */
>
> So if any page in first 1MB ended up to the page allocator, it should be the
> kernel bug which is not related to TDX, correct?

I suggest to add some code to verify this. It's possible for the code
to be changed in the future (although possibility is low). And TDX may
not be changed at the same time. Then the verifying code here can catch
that. So, we can make change accordingly.

Best Regards,
Huang, Ying