RE: [PATCH] cpu-topology: warn if NUMA configurations conflicts with lower layer

From: Zengtao (B)
Date: Tue Jan 07 2020 - 21:19:22 EST


> -----Original Message-----
> From: Dietmar Eggemann [mailto:dietmar.eggemann@xxxxxxx]
> Sent: Monday, January 06, 2020 10:31 PM
> To: Zengtao (B); Valentin Schneider; Sudeep Holla
> Cc: Linuxarm; Greg Kroah-Hartman; Rafael J. Wysocki;
> linux-kernel@xxxxxxxxxxxxxxx; Morten Rasmussen
> Subject: Re: [PATCH] cpu-topology: warn if NUMA configurations conflicts
> with lower layer
>
> On 06/01/2020 02:48, Zengtao (B) wrote:
>
> [...]
>
> >> -----Original Message-----
> >> From: Dietmar Eggemann [mailto:dietmar.eggemann@xxxxxxx]
> >> Sent: Saturday, January 04, 2020 1:21 AM
> >> To: Valentin Schneider; Zengtao (B); Sudeep Holla
> >> Cc: Linuxarm; Greg Kroah-Hartman; Rafael J. Wysocki;
> >> linux-kernel@xxxxxxxxxxxxxxx; Morten Rasmussen
> >> Subject: Re: [PATCH] cpu-topology: warn if NUMA configurations
> conflicts
> >> with lower layer
> >>
> >> On 03/01/2020 13:14, Valentin Schneider wrote:
> >>> On 03/01/2020 10:57, Valentin Schneider wrote:
>
> >> Still don't see the actual problem case. The closest I came is:
> >>
> >> qemu-system-aarch64 -kernel ... -append ' ... loglevel=8 sched_debug'
> >> -smp cores=4,sockets=2 ... -numa node,cpus=0-2,nodeid=0
> >> -numa node,cpus=3-7,nodeid=1
> >>
> >
> > It's related to the HW topology, if you hw have got 2 clusters 0~3, 4~7,
> > with the mainline qemu, you will see the issue.
> > I think you can manually modify the MPIDR parsing to reproduce the
> > issue.
> > Linux will use the MPIDR to guess the MC topology since currently qemu
> > don't provide it.
> > Refer to: https://patchwork.ozlabs.org/cover/939301/
>
> That makes sense to me. Valentin and I already discussed this setup as a
> possible system where this issue can happen.
>
> I already suspected that virt machines only support flat cpu toplogy.
> Good to know. Although I was able to to pass '... -smp cores=8 -dtb
> foo.dtb ...' into mainline qemu to achieve a 2 cluster system (MC and
> DIE sd level) with an extra cpu-map entry in the dts file:
>
> cpu-map {
> cluster0 {
> core0 {
> cpu = <&A53_0>;
> };
> ...
> };
>
> cluster1 {
> core0 {
> cpu = <&A53_4>;
> };
> ...
> };
> };
>
> But I didn't succeed in combining this with the '... -numa
> node,cpus=0-3,nodeid=0 -numa node,cpus=4-7,nodeid=1 ...' params to
> create a system like yours.

I guest that you have used your own dtb, so maybe you need to specify the
numa_node_id in the device tree.
Maybe you can refer to:
Documentation/devicetree/bindings/numa.txt

>
> Your issue is related to the 'numa mask check for scheduler MC
> selection' functionality. It was introduced by commit 37c3ec2d810f and
> re-introduced by commit e67ecf647020 later. I don't know why we need
> this functionality?
>
> How does your setup behave when you revert commit e67ecf647020? Or
> do
> you want an explicit warning in case of NUMA boundaries not respecting
> physical topology?

I will need to have a look to commit e67ecf647020
Thanks

Regards
Zengtao