Re: [PATCH] x86/smpboot: Make logical package management more robust

From: Thomas Gleixner
Date: Sat Dec 10 2016 - 14:16:26 EST


On Fri, 9 Dec 2016, Boris Ostrovsky wrote:
> On 12/09/2016 06:02 PM, Boris Ostrovsky wrote:
> > On 12/09/2016 05:06 PM, Thomas Gleixner wrote:
> > > On Thu, 8 Dec 2016, Thomas Gleixner wrote:
> > >
> > > Boris, can you please verify if that makes the
> > > topology_update_package_map() call which you placed into the Xen cpu
> > > starting code obsolete ?
> >
> > Will do. I did test your patch but without removing
> > topology_update_package_map() call. It complained about package IDs
> > being wrong, but that's expected until I fix Xen part.
>
> Ignore my statement about earlier testing --- it was all on single-node
> machines.
>
> Something is broken with multi-node on Intel, but failure modes are different.
> Prior to this patch build_sched_domain() reports an error and pretty soon we
> crash in scheduler (don't remember off the top of my head). With patch applied
> I crash mush later, when one of the drivers does kmalloc_node(..,
> cpu_to_node(cpu)) and cpu_to_node() returns 1, which should never happen
> ("x86: Booted up 1 node, 32 CPUs" is reported, for example).

Hmm. But the cpu_to_node() association is unrelated to the logical package
management.

> 2-node AMD box doesn't have these problems.
>
> I haven't upgraded the Intel machine for about a month but this all must have
> happened in 4.9 timeframe.
>
> So I can't answer your question since we clearly have other problems on Xen. I
> will be looking into this.

Fair enough. What you could do though with this patch applied and the extra
XEN call to topology_update_package_map() removed is to watchout for the
following messages:

pr_info("Max logical packages: %u\n", __max_logical_packages);

and

pr_warn(CPU %u Converting physical %u to logical package %u\n", ...)

Ideally the latter wont show.

Thanks,

tglx