Re: [PATCH] sched/topology: Fix overlapping sched_group build

From: Dietmar Eggemann
Date: Wed Mar 25 2020 - 07:06:00 EST


On 24.03.20 18:20, Valentin Schneider wrote:
>
> On Tue, Mar 24 2020, Valentin Schneider wrote:
>> Fix
>> ===
>>
>> Sanitize the groups we get out of build_group_from_child_sched_domain()
>> with the span of the domain we're currently building - this ensures the
>> groups we build only contain CPUs that are the right distance away from the
>> base CPU. This also requires modifying build_balance_mask().
>>
>
> I somehow missed that this triggers the WARN_ON_ONCE() in
> init_overlap_sched_group(). Gotta figure out why.

Cool, seems like qemu can emulate this!


qemu-system-aarch64 -kernel Image -hda ./qemu-image-aarch64.img -append
'root=/dev/vda console=ttyAMA0 loglevel=8 sched_debug' -smp cores=8
--nographic -m 512 -cpu cortex-a53 -machine virt -numa
node,cpus=0-1,nodeid=0 -numa node,cpus=2-3,nodeid=1, -numa
node,cpus=4-5,nodeid=2, -numa node,cpus=6-7,nodeid=3, -numa
dist,src=0,dst=1,val=12, -numa dist,src=0,dst=2,val=20, -numa
dist,src=0,dst=3,val=22, -numa dist,src=1,dst=2,val=22, -numa
dist,src=1,dst=3,val=24, -numa dist,src=2,dst=3,val=12

...
[ 0.711685]
[ 0.711767] 10 12 20 22
[ 0.711860] 12 10 22 24
[ 0.711917] 20 22 10 12
[ 0.711970] 22 24 12 10
[ 0.712036]
[ 0.718356] CPU0 attaching sched-domain(s):
[ 0.718433] domain-0: span=0-1 level=MC
[ 0.718646] groups: 0:{ span=0 cap=1006 }, 1:{ span=1 cap=1015 }
[ 0.718865] domain-1: span=0-3 level=NUMA
[ 0.718906] groups: 0:{ span=0-1 cap=2021 }, 2:{ span=2-3 cap=2014 }
[ 0.719044] domain-2: span=0-5 level=NUMA
[ 0.719082] groups: 0:{ span=0-3 cap=4035 }, 4:{ span=4-7 cap=3855 }
[ 0.719164] ERROR: groups don't span domain->span
[ 0.719191] domain-3: span=0-7 level=NUMA
[ 0.719228] groups: 0:{ span=0-5 mask=0-1 cap=5974 }, 6:{
span=4-7 mask=6-7 cap=3964 }
[ 0.719961] CPU1 attaching sched-domain(s):
...

cat /proc/sys/kernel/sched_domain/cpu0/domain*/{name,flags}
MC
NUMA
NUMA
NUMA
4655
25647
25647
25647