Re: perf: perf_fuzzer crashes on Pentium 4 systems

From: Vince Weaver
Date: Thu Apr 04 2019 - 15:01:21 EST


On Thu, 4 Apr 2019, Cyrill Gorcunov wrote:

> On Thu, Apr 04, 2019 at 12:37:18PM -0400, Vince Weaver wrote:
>
> Oh, Vince, I suspect such kind of bisection might consume a lot of your
> time :( Maybe we could update perf fuzzer so that it would send events
> to some net-storage first then write them to the counters, iow to automatize
> this all stuff somehow?

I do have a lot of this automated already from tracking down past bugs,
but it turns out that most of the fuzzer-found bugs aren't deterministic
so it doesn't always work.

For example this bug, while I can easily repeat it, doesn't happen at
the same time each time. I suspect something corrupts things, but the
crash doesn't trigger until a context switch happens.

For what it's worth I've put code in p4_pmu_enable_all() to see what's
going on when the NULL dereference happens, and sure enough the printk is
triggered where I'd expect.

[ 138.132889] VMW: p4_pmu_enable_all: idx 4 is NULL
[ 138.171380] VMW: p4_pmu_enable_all: idx 4 is NULL
[ 138.212588] VMW: p4_pmu_enable_all: idx 4 is NULL
[ 138.263761] VMW: p4_pmu_enable_all: idx 4 is NULL
[ 138.279944] VMW: p4_pmu_enable_all: idx 4 is NULL

static void p4_pmu_enable_all(int added)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
int idx;

for (idx = 0; idx < x86_pmu.num_counters; idx++) {
struct perf_event *event = cpuc->events[idx];
if (!test_bit(idx, cpuc->active_mask))
continue;
if (event==NULL) {
printk("VMW: p4_pmu_enable_all: idx %d is NULL\n",idx);
} else {
p4_pmu_enable_event(event);
}
}
}


the machine still crashes after this, but not right away.

Vince