Re: [PATCH] oprofile: fix CPU unplug panic in ppro_stop()

From: Andi Kleen
Date: Wed Dec 03 2008 - 09:08:28 EST


Robert Richter wrote:
On 02.12.08 09:17:29, Ingo Molnar wrote:
* Eric Dumazet <dada1@xxxxxxxxxxxxx> wrote:

If oprofile statically compiled in kernel, a cpu unplug triggers
a panic in ppro_stop(), because a NULL pointer is dereferenced.

Signed-off-by: Eric Dumazet <dada1@xxxxxxxxxxxxx>
---
arch/x86/oprofile/op_model_ppro.c | 4 ++++
1 files changed, 4 insertions(+)
diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
index 716d26f..e9f80c7 100644
--- a/arch/x86/oprofile/op_model_ppro.c
+++ b/arch/x86/oprofile/op_model_ppro.c
@@ -156,6 +156,8 @@ static void ppro_start(struct op_msrs const * const msrs)
unsigned int low, high;
int i;
+ if (!reset_value)
+ return;

for (i = 0; i < num_counters; ++i) {
if (reset_value[i]) {
CTRL_READ(low, high, msrs, i);

The patch fixes the null pointer access and this ok. But the root
cause seems to be in the cpu hotplug and initialization
code. xxx_start() should not be called before xxx_setup_ctrs() or
after xxx_shutdown().

Yes, it would be better to fix that. At least it would make
the code cleaner than the add checks for this backdoor everywhere.

Also, running only xxx_start() and xxx_stop() in
the cpu notifier functions is not sufficient. There is at least some
on_each_cpu code in nmi_setup() that should be called also in the cpu
notifier functions. I have to review that code.

AFAIK cpu hotplug has more problems in oprofile anyways. That is why
I didn't test that case.


[...]

It was absolutely unnecessary to add kmalloc to this rarely executed codepath - and the way it was added was absolutely horrible as well, it was tacked on in the middle of an existing codepath, instead of factoring it out nicely. Perfmon will eventually replace PMC management anyway, so there was no "this way it's cleaner" argument either. So this code should have been changed minimally, instead of slapping in a full kmalloc for a simple array extension from 2 to 4 entries ...

Ingo, you are right that using kmalloc is unnecessary for
reset_value. So, Andi, maybe you could make this code easier?

The reason I added the kmalloc is that there's also a varying number
of separate fixed function counters (although that's not currently
submitted).

Also I would prefer to not have a hard coded number for future
CPUs. Contrary to other people's opinion architectural perfmon is
not for Nehalem only.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/