Re: [PATCH 5/6] perf: Optimise topology iteration

From: Lin Ming
Date: Mon Feb 21 2011 - 00:00:50 EST


On Mon, 2011-02-21 at 11:32 +0800, Andi Kleen wrote:
> On Mon, Feb 21, 2011 at 11:29:24AM +0800, Lin Ming wrote:
> > On Mon, 2011-02-21 at 05:15 +0800, Andi Kleen wrote:
> > > On Mon, Feb 21, 2011 at 12:57:39AM +0800, Lin Ming wrote:
> > > > Currently we iterate the full machine looking for a matching core_id/nb
> > > > for the percore and the amd northbridge stuff , using a smaller topology
> > > > mask makes sense.
> > >
> > > This is still wrong for CPU hotplug. The CPU "owning" the per core
> > > does not necessarily need to be online anymore.
> >
> > This is remain issue for hotplug case, no matter we use
> > for_each_online_cpu or topology_thread_cpumask.
>
> The original code I submitted used for_each_possible_cpu which
> is correct.
>
> >
> > > Please drop this patch.
> >
> > Re-look at the code, I think for_each_online_cpu is wrong for percore,
> > we should use topology_thread_cpumask instead.
>
> No, that's also cleared on unplug. You really need the possible map
> and nothing else.

That's wrong for kernel initialization, not related to hotplug.

I wrote a simple debug patch,

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index f152930..913a8a5 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1123,7 +1123,7 @@ static void intel_pmu_cpu_starting(int cpu)
if (!ht_capable())
return;

- for_each_cpu(i, topology_thread_cpumask(cpu)) {
+ for_each_possible_cpu(i) {
struct intel_percore *pc = per_cpu(cpu_hw_events, i).per_core;

if (pc && pc->core_id == core_id) {
@@ -1135,6 +1135,9 @@ static void intel_pmu_cpu_starting(int cpu)

cpuc->per_core->core_id = core_id;
cpuc->per_core->refcnt++;
+
+ printk("DEBUG: cpu%d, per_core %p, core_id: %d, ref_count: %d\n",
+ cpu, cpuc->per_core, cpuc->per_core->core_id, cpuc->per_core->refcnt);
}

static void intel_pmu_cpu_dying(int cpu)

The output as below,

DEBUG: cpu0, per_core ffff8801bec32600, core_id: 0, ref_count: 1
DEBUG: cpu1, per_core ffff8801bec32600, core_id: 0, ref_count: 2
DEBUG: cpu2, per_core ffff8801bec32a20, core_id: 1, ref_count: 1
DEBUG: cpu3, per_core ffff8801bec32a20, core_id: 1, ref_count: 2
DEBUG: cpu4, per_core ffff8801bec32de0, core_id: 2, ref_count: 1
DEBUG: cpu5, per_core ffff8801bec32de0, core_id: 2, ref_count: 2
DEBUG: cpu6, per_core ffff8801becfc120, core_id: 3, ref_count: 1
DEBUG: cpu7, per_core ffff8801becfc120, core_id: 3, ref_count: 2
DEBUG: cpu8, per_core ffff8801bec32600, core_id: 0, ref_count: 3
DEBUG: cpu9, per_core ffff8801bec32600, core_id: 0, ref_count: 4
DEBUG: cpu10, per_core ffff8801bec32a20, core_id: 1, ref_count: 3
DEBUG: cpu11, per_core ffff8801bec32a20, core_id: 1, ref_count: 4
DEBUG: cpu12, per_core ffff8801bec32de0, core_id: 2, ref_count: 3
DEBUG: cpu13, per_core ffff8801bec32de0, core_id: 2, ref_count: 4
DEBUG: cpu14, per_core ffff8801becfc120, core_id: 3, ref_count: 3
DEBUG: cpu15, per_core ffff8801becfc120, core_id: 3, ref_count: 4

As you can see, cpu0, cpu1, cpu8 and cpu9 share the same per_core(ffff8801bec32600).
This is wrong.

cpu0 and cpu8 should share one pef_core, cpu1 and cpu9 share another per_core.

>
> -Andi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/