Re: [PATCH RFT v2] x86: move cacheinfo sysfs to generic cacheinfo infrastructure

From: Borislav Petkov
Date: Tue Mar 03 2015 - 13:46:45 EST


@Tejun: scroll to the end :)

On Sun, Mar 01, 2015 at 10:39:24PM +0000, Andre Przywara wrote:
> So I gave that a spin on my old PhenomII X6 box, and indeed there is a
> NULL pointer dereference. I tracked it down to
> __cache_amd_cpumap_setup(), where the sibling map for the L3 is
> populated. When the function is called for the first CPU, subsequent
> CPU's cacheinfo structures obviously haven't been initialized yet, so

Right, let me try to understand the splat:

[ 3.030598] RIP: 0010:[<ffffffff810141c0>] [<ffffffff810141c0>] _populate_cache_leaves+0x440/0x460

This is:

9d6: 49 0f a3 06 bt %rax,(%r14)
9da: 19 c9 sbb %ecx,%ecx
9dc: 85 c9 test %ecx,%ecx
9de: 74 d0 je 9b0 <_populate_cache_leaves+0x410>
9e0: f0 49 0f ab 84 24 f8 lock bts %rax,0xf8(%r12) <----
9e7: 00 00 00
9ea: eb c4 jmp 9b0 <_populate_cache_leaves+0x410>
9ec: 4c 8b 65 98 mov -0x68(%rbp),%r12
9f0: e9 eb fd ff ff jmpq 7e0 <_populate_cache_leaves+0x240>
9f5: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1)

%r12 is 0 and that is this hunk in __cache_amd_cpumap_setup():

} else if (index == 3) {
for_each_cpu(i, cpu_llc_shared_mask(cpu)) {
this_cpu_ci = get_cpu_cacheinfo(i);
this_leaf = this_cpu_ci->info_list + index;
for_each_cpu(sibling, cpu_llc_shared_mask(cpu)) {
if (!cpu_online(sibling))
continue;
cpumask_set_cpu(sibling, <--- set_bit() does LOCK
&this_leaf->shared_cpu_map);
}
}

so this_leaf is NULL.

We get it by doing

this_leaf = this_cpu_ci->info_list + index;

and this_cpu_ci is a per-CPU variable.

Now, previously the code did

- if (!per_cpu(ici_cpuid4_info, i))
- continue;


and __cache_cpumap_setup() already does:

if (i == cpu || !sib_cpu_ci->info_list)
continue;/* skip if itself or no cacheinfo */

so maybe we should do that too in __cache_amd_cpumap_setup():

if (!this_cpu_ci->info_list)
continue;

for the index == 3 case?

It boots fine here with that change and it is consistent with the
previous code.

And yes, the x86 cacheinfo code could use a serious rubbing and cleanup.

Btw, this patch introduces a bunch of new sysfs nodes in the caches
hierarchy:

--- caches-guest-before.txt 2015-03-03 15:11:09.168276423 +0100
+++ caches-guest-after.txt 2015-03-03 18:19:04.084426130 +0100
@@ -1,6 +1,22 @@
+/sys/devices/system/cpu/cpu0/cache/power/control:1:auto
+/sys/devices/system/cpu/cpu0/cache/power/async:1:disabled
+/sys/devices/system/cpu/cpu0/cache/power/runtime_enabled:1:disabled
+/sys/devices/system/cpu/cpu0/cache/power/runtime_active_kids:1:0
+/sys/devices/system/cpu/cpu0/cache/power/runtime_active_time:1:0
+/sys/devices/system/cpu/cpu0/cache/power/runtime_status:1:unsupported
+/sys/devices/system/cpu/cpu0/cache/power/runtime_usage:1:0
+/sys/devices/system/cpu/cpu0/cache/power/runtime_suspended_time:1:0
/sys/devices/system/cpu/cpu0/cache/index0/size:1:64K
/sys/devices/system/cpu/cpu0/cache/index0/type:1:Data
/sys/devices/system/cpu/cpu0/cache/index0/level:1:1
+/sys/devices/system/cpu/cpu0/cache/index0/power/control:1:auto
+/sys/devices/system/cpu/cpu0/cache/index0/power/async:1:disabled
+/sys/devices/system/cpu/cpu0/cache/index0/power/runtime_enabled:1:disabled
+/sys/devices/system/cpu/cpu0/cache/index0/power/runtime_active_kids:1:0
+/sys/devices/system/cpu/cpu0/cache/index0/power/runtime_active_time:1:0
+/sys/devices/system/cpu/cpu0/cache/index0/power/runtime_status:1:unsupported
+/sys/devices/system/cpu/cpu0/cache/index0/power/runtime_usage:1:0
+/sys/devices/system/cpu/cpu0/cache/index0/power/runtime_suspended_time:1:0
/sys/devices/system/cpu/cpu0/cache/index0/number_of_sets:1:512
/sys/devices/system/cpu/cpu0/cache/index0/shared_cpu_map:1:1
/sys/devices/system/cpu/cpu0/cache/index0/shared_cpu_list:1:0
...

What do those things mean? runtime_active_kids ?? Kids are active during
runtime?! Well, that's a given, no need for a sysfs node for that :-)

> > Indeed I see a regression between 3.19.0 and a clean 4.0.0-rc1 in this
> > respect, the diff is like:
> > -cpu5/cache/index3/shared_cpu_map:00000000,0000003f
> > +cpu5/cache/index3/shared_cpu_map:3f
> > (for each cpu and index)
> >
> > Can someone confirm (and investigate) this?
>
> I saw this even on ARM64 platforms with 4.0-rc1 and did narrow down to
> series of formatting functions changes introduced by Tejun Heo, mainly
> commit 4a0792b0e7a6("bitmap: use %*pb[l] to print bitmaps including
> cpumasks and nodemasks")

Strictly speaking, this is a regression if userspace is parsing those
masks. I hardly doubt anything is doing that but if we had to be
completely strict we should add back the zeroes.

Tejun, we have this change in user-visible masks formatting in sysfs
after your bitmaps printing changes:

-cpu5/cache/index3/shared_cpu_map:00000000,0000003f
+cpu5/cache/index3/shared_cpu_map:3f

What's the rationale on userspace parsing bitmaps in sysfs, we don't
care?

Thanks.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/