Re: [PATCH] perf/x86/intel/cqm: Make sure the head event of cache_groups always has valid RMID

From: David Carrillo-Cisneros
Date: Thu May 18 2017 - 00:59:46 EST


On Tue, May 16, 2017 at 7:38 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Thu, May 04, 2017 at 10:31:43AM +0800, Zefan Li wrote:
>> It is assumed that the head of cache_groups always has valid RMID,
>> which isn't true.
>>
>> When we deallocate RMID from conflicting events currently we don't
>> move them to the tail, and one of those events can happen to be in
>> the head. Another case is we allocate RMIDs for all the events except
>> the head event in intel_cqm_sched_in_event().
>>
>> Besides there's another bug that we retry rotating without resetting
>> nr_needed and start in __intel_cqm_rmid_rotate().
>>
>> Those bugs combined together led to the following oops.
>>
>> WARNING: at arch/x86/kernel/cpu/perf_event_intel_cqm.c:186 __put_rmid+0x28/0x80()
>> ...
>> [<ffffffff8103a578>] __put_rmid+0x28/0x80
>> [<ffffffff8103a74a>] intel_cqm_rmid_rotate+0xba/0x440
>> [<ffffffff8109d8cb>] process_one_work+0x17b/0x470
>> [<ffffffff8109e69b>] worker_thread+0x11b/0x400
>> ...
>> BUG: unable to handle kernel NULL pointer dereference at (null)

I ran into this bug long time ago but never found an easy way to
reproduce. Do you have one?

>> ...
>> [<ffffffff8103a74a>] intel_cqm_rmid_rotate+0xba/0x440
>> [<ffffffff8109d8cb>] process_one_work+0x17b/0x470
>> [<ffffffff8109e69b>] worker_thread+0x11b/0x400
>
> I've managed to forgot most if not all of that horror show. Vikas and
> David seem to be working on a replacement, but until such a time it
> would be good if this thing would not crash the kernel.
>
> Guys, could you have a look? To me it appears to mostly have the right
> shape, but like I said, I forgot most details...

The patch LGTM. I ran into this issues before and fixed them in a
similar but messier way, then the re-write started ...

>
>>
>> Cc: stable@xxxxxxxxxxxxxxx
>> Signed-off-by: Zefan Li <lizefan@xxxxxxxxxx>
Acked-by: David Carrillo-Cisneros <davidcc@xxxxxxxxxx>