Re: [PATCH] x86/platform/uv: Abort UV initialization when reduced nr_cpus requires it

From: Steve Wahl
Date: Wed Jul 12 2023 - 17:19:29 EST


On Tue, Jul 11, 2023 at 04:07:55PM -0700, Dave Hansen wrote:
> On 7/11/23 13:26, Steve Wahl wrote:
> > When nr_cpus is set to a smaller number than actually present, there
> > is some node-to-socket mapping info we won't get access to in
>
> First of all, no "we's" in commit messages.
>
> > https://www.kernel.org/doc/html/next/process/maintainer-tip.html#changelog

Ah, I was trying to be imperative in the description of what to do,
but didn't understand it applied as much to the description of what
happened in the past that needs to be fixed. I will fix this.

> > build_socket_tables(). This could later result in using a -1 value
> > for some array indexing, and eventual kernel page faults.
> >
> > To avoid this, if any unfilled table entries are found, print a
> > warning message, and resume initializing, acting as if this is not a
> > UV system. UV features will be unavailable, but we will not cause
> > kernel dumps.
> >
> > This is a condition we expect only in platform debugging situations,
> > not in day-to-day operation.
>
> This seems like a hack.
>
> The real problem is that you've got an online Linux NUMA node with no
> CPUs. uv_system_init_hub() (probably) goes and does:
>
> > for_each_node(nodeid)
> > __uv_hub_info_list[nodeid] = uv_hub_info_list_blade[uv_node_to_blade_id(nodeid)];
>
> But the node=>blade lookup uses socket numbers. No CPUs means no socket
> numbers. You _have_ the blade information _somewhere_. Is there really
> no other way to map it to a NUMA node than using the CPU apicid?

I will see if I can find a better place to obtain this information.

Thank you.

--> Steve Wahl

--
Steve Wahl, Hewlett Packard Enterprise