Re: [PATCH v3] mm: fix panic in __alloc_pages

From: Michal Hocko
Date: Thu Nov 18 2021 - 03:36:09 EST


On Tue 16-11-21 20:22:49, Alexey Makhalov wrote:
>
>
> > On Nov 16, 2021, at 1:17 AM, Michal Hocko <mhocko@xxxxxxxx> wrote:
> >
> > On Tue 16-11-21 01:31:44, Alexey Makhalov wrote:
> > [...]
> >> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> >> index 6737b1cbf..bbc1a70d5 100644
> >> --- a/drivers/acpi/acpi_processor.c
> >> +++ b/drivers/acpi/acpi_processor.c
> >> @@ -200,6 +200,10 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr)
> >> * gets online for the first time.
> >> */
> >> pr_info("CPU%d has been hot-added\n", pr->id);
> >> + {
> >> + int nid = cpu_to_node(pr->id);
> >> + printk("%s:%d cpu %d, node %d, online %d, ndata %p\n", __FUNCTION__, __LINE__, pr->id, nid, node_online(nid), NODE_DATA(nid));
> >> + }
> >> pr->flags.need_hotplug_init = 1;
> >
> > OK, IIUC you are adding a processor which is outside of
> > possible_cpu_mask and that means that the node is not allocated for such
> > a future to be hotplugged cpu and its memory node. init_cpu_to_node
> > would have done that initialization otherwise.
> It is not correct.
>
> possible_cpus is 128 for this VM. Look at SRAT and percpu output for proof.
> [ 0.085524] SRAT: PXM 127 -> APIC 0xfe -> Node 127
> [ 0.118928] setup_percpu: NR_CPUS:128 nr_cpumask_bits:128 nr_cpu_ids:128 nr_node_ids:128

OK, I see. I have missed that when looking at the boot log you have
sent.

> It is impossible to add processor outside of possible_cpu_mask. possible_cpus is absolute maximum
> that system can support. See Documentation/core-api/cpu_hotplug.rst

That was my understanding hence the suspicion you might be doing
something that is not really supported.

> Number of present and onlined CPUs (and nodes) is 4. Other 124 CPUs (and nodes) are not present, but can
> be potentially hot added.

Yes this is a configuration I have already seen. The cpu->node binding
was configured during the boot time though IIRC.

> Number of initialized nodes is 4, as init_cpu_to_node() will skip not yet present nodes,
> see arch/x86/mm/numa.c:798 (numa_cpu_node(CPU #4) == NUMA_NO_NODE)

Isn't this the problem? Why is the cpu->node association missing here?

> 788 void __init init_cpu_to_node(void)
> 789 {
> 790 int cpu;
> 791 u16 *cpu_to_apicid = early_per_cpu_ptr(x86_cpu_to_apicid);
> 792
> 793 BUG_ON(cpu_to_apicid == NULL);
> 794
> 795 for_each_possible_cpu(cpu) {
> 796 int node = numa_cpu_node(cpu);
> 797
> 798 if (node == NUMA_NO_NODE)
> 799 continue;
> 800
>
> After CPU (and node) hot plug:
> - CPU 4 is marker as present, but not yet online
> - New node got ID 4. numa_cpu_node(CPU #4) returns 4
> - node_online(4) == 0 and NODE_DATA(4) == NULL, but it will be accessed inside
> for_each_possible_cpu loop in percpu allocation.
>
> Digging further.
> Even if x86/CPU hot add maintainers decide to clean up memoryless node hot add code to initialize the node on time of
> attaching it (to be aligned with mm node while memory hot add), this percpu fix is still needed as it is used during
> the node onlining, See chicken and egg problem that I described above.

I have to say I do not see the chicken and egg problem. As long as
init_cpu_to_node initializes the memoryless node for the cpu properly
then the pcp allocator doesn't really have to care as the page allocator
falls back to to first populated node in a distance order. So I believe
the whole issue boils down to addressing why init_cpu_to_node doesn't
see a proper cpu->node association.

> Or as 2nd option, numa_cpu_node(4) should return NUMA_NO_NODE until node 4 get fully initialized.
--
Michal Hocko
SUSE Labs