[RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

From: Sebastian Andrzej Siewior
Date: Wed Nov 02 2016 - 08:26:05 EST


After the hotplug rework Charles Williams reported that his vmware
virtualized system no longer boots and crashes in rapl_cpu_online().
As it turns out topology_max_packages() reports four while
topology_logical_package_id() for CPU two and three returns 65535. That
means cpu_to_rapl_pmu() for those CPUs is accessing not allocated memory
of rapl_pmus->pmus[].
"M. Vefa Bicakci" reported the same problem on XEN.
This patch ensures we error out in such an invalid situation.

Reported-by: "Charles (Chas) Williams" <ciwillia@xxxxxxxxxxx>
Tested-by: "M. Vefa Bicakci" <m.v.b@xxxxxxxxxx>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
---
I am not sure if this a race with the new hotplug code or something that was
always there. Both (M. Vefa Bicakc and Charles) say that the box boots
sometimes fine (without the patch). smp_store_boot_cpu_info() should have run
before the notofoert and thus should have set the info properly. However I got
the following bootlog from Charles with this patch:

[ 0.017110] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.017111] smpboot: APIC(1) Converting physical 1 to logical package 1
[ 0.017113] smpboot: Max logical packages: 2
â
[ 1.995494] RAPL PMU: rapl pmu error: max package: 2 but CPU1 belongs to 65535
[ 1.995647] rapl pmu error: max package: 2 but CPU1 belongs to 65535

So it seems that the information got overwritten. I am not sure how to proceed
here. That memory corruption should be found and fixed and a boot crash might
motivate one to do soâ I can't reproduce this on barematal.

Thread starts at
d40f8e3c-b332-c331-38b9-11eb4f4aaaa7@xxxxxxxxxxx

arch/x86/events/intel/rapl.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c
index 0a535cea8ff3..f5d85f2853d7 100644
--- a/arch/x86/events/intel/rapl.c
+++ b/arch/x86/events/intel/rapl.c
@@ -682,6 +682,15 @@ static int __init init_rapl_pmus(void)
{
int maxpkg = topology_max_packages();
size_t size;
+ unsigned int cpu;
+
+ for_each_possible_cpu(cpu) {
+ if (topology_logical_package_id(cpu) >= maxpkg) {
+ pr_err("rapl pmu error: max package: %u but CPU%d belongs to %u\n",
+ maxpkg, cpu, topology_logical_package_id(cpu));
+ return -EINVAL;
+ }
+ }

size = sizeof(*rapl_pmus) + maxpkg * sizeof(struct rapl_pmu *);
rapl_pmus = kzalloc(size, GFP_KERNEL);
--
2.10.2