RE: [PATCH] x86/mm: avoid truncating memblocks for SGX memory

From: Du, Fan
Date: Fri Jun 18 2021 - 04:44:56 EST




>-----Original Message-----
>From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
>Sent: Friday, June 18, 2021 3:47 AM
>To: linux-mm@xxxxxxxxx
>Cc: linux-kernel@xxxxxxxxxxxxxxx; Dave Hansen
><dave.hansen@xxxxxxxxxxxxxxx>; Du, Fan <fan.du@xxxxxxxxx>; Chatre,
>Reinette <reinette.chatre@xxxxxxxxx>; jarkko@xxxxxxxxxx; Williams, Dan J
><dan.j.williams@xxxxxxxxx>; Hansen, Dave <dave.hansen@xxxxxxxxx>;
>x86@xxxxxxxxxx; linux-sgx@xxxxxxxxxxxxxxx; luto@xxxxxxxxxx;
>peterz@xxxxxxxxxxxxx
>Subject: [PATCH] x86/mm: avoid truncating memblocks for SGX memory
>
>
>From: Fan Du <fan.du@xxxxxxxxx>
>
>tl;dr:
>
>Several SGX users reported seeing the following message on NUMA systems:
>
> sgx: [Firmware Bug]: Unable to map EPC section to online node. Fallback
>to the NUMA node 0.
>
>This turned out to be the 'memblock' code mistakenly throwing away
>SGX memory.
>
>=== Full Changelog ===
>
>The 'max_pfn' variable represents the highest known RAM address. It can
>be used, for instance, to quickly determine for which physical addresses
>there is mem_map[] space allocated. The numa_meminfo code makes an
>effort to throw out ("trim") all memory blocks which are above 'max_pfn'.
>
>SGX memory is not considered RAM (it is marked as "Reserved" in the
>e820) and is not taken into account by max_pfn. Despite this, SGX
>memory areas have NUMA affinity and are enumerated in the ACPI SRAT.
>The existing SGX code uses the numa_meminfo mechanism to look up the
>NUMA affinity for its memory areas.
>
>In cases where SGX memory was above max_pfn (usually just the one EPC
>section in the last highest NUMA node), the numa_memblock is truncated
>at 'max_pfn', which is below the SGX memory. When the SGX code tries to
>look up the affinity of this memory, it fails and produces an error message:
>
> sgx: [Firmware Bug]: Unable to map EPC section to online node. Fallback
>to the NUMA node 0.
>
>and assigns the memory to NUMA node 0.
>
>Instead of silently truncating the memory block at 'max_pfn' and
>dropping the SGX memory, add the truncated portion to
>'numa_reserved_meminfo'. This allows the SGX code to later determine
>the NUMA affinity of its 'Reserved' area.
>
>Without this patch, numa_meminfo looks like this (from 'crash'):
>
> blk = { start = 0x0, end = 0x2080000000, nid = 0x0 }
> { start = 0x2080000000, end = 0x4000000000, nid = 0x1 }
>
>numa_reserved_meminfo is empty.
>
>After the patch, numa_meminfo looks like this:
>
> blk = { start = 0x0, end = 0x2080000000, nid = 0x0 }
> { start = 0x2080000000, end = 0x4000000000, nid = 0x1 }
>
>and numa_reserved_meminfo has an entry for node 1's SGX memory:
>
> blk = { start = 0x4000000000, end = 0x4080000000, nid = 0x1 }
>
> [ daveh: completely rewrote/reworked changelog ]

Really what's your PROBLEM?!
Neither did I ask you to send my patch, nor do I agree to change it.
Who grant you the right to do this ?!
It's disgraceful to do this w/o my notice.

If you have comments, please DO align with the other two maintainers Jarkko and Dan first,
who already reviewed the patch in this format.

https://lkml.org/lkml/2021/6/17/1151



>Signed-off-by: Fan Du <fan.du@xxxxxxxxx>
>Reported-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>
>Reviewed-by: Jarkko Sakkinen <jarkko@xxxxxxxxxx>
>Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx>
>Reviewed-by: Dave Hansen <dave.hansen@xxxxxxxxx>
>Fixes: 5d30f92e7631 ("x86/NUMA: Provide a range-to-target_node lookup
>facility")
>Cc: x86@xxxxxxxxxx
>Cc: linux-sgx@xxxxxxxxxxxxxxx
>Cc: Andy Lutomirski <luto@xxxxxxxxxx>
>Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>---
>
> b/arch/x86/mm/numa.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
>diff -puN arch/x86/mm/numa.c~sgx-srat arch/x86/mm/numa.c
>--- a/arch/x86/mm/numa.c~sgx-srat 2021-06-17 11:23:05.116159990 -0700
>+++ b/arch/x86/mm/numa.c 2021-06-17 11:55:46.117155100 -0700
>@@ -254,7 +254,13 @@ int __init numa_cleanup_meminfo(struct n
>
> /* make sure all non-reserved blocks are inside the limits */
> bi->start = max(bi->start, low);
>- bi->end = min(bi->end, high);
>+
>+ /* preserve info for non-RAM areas above 'max_pfn': */
>+ if (bi->end > high) {
>+ numa_add_memblk_to(bi->nid, high, bi->end,
>+ &numa_reserved_meminfo);
>+ bi->end = high;
>+ }
>
> /* and there's no empty block */
> if (bi->start >= bi->end)
>_