Re: [GIT PULL] perf fixes

From: Stephane Eranian
Date: Fri Mar 15 2013 - 06:50:29 EST


On Fri, Mar 15, 2013 at 9:01 AM, Stephane Eranian <eranian@xxxxxxxxxx> wrote:
> On Fri, Mar 15, 2013 at 2:06 AM, Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>> On Thu, Mar 14, 2013 at 5:24 PM, Stephane Eranian <eranian@xxxxxxxxxx> wrote:
>>>
>>> I bet if you force the affinity of your perf record to be on
>>> a CPU other than CPU0, you will not get the crash.
>>>
>>> This is what I am seeing now. I appears on resume,
>>> CPU0 hotplug callbacks for perf_events are not invoked
>>> leaving DS_AREA MSR to 0.
>>>
>>> Can you confirm on your machine?
>>
>> I'm not even going to bother confirming it, because I think you're
>> right, and I think the reason is clear: the DS initialization code
>> uses the CPU_UP notifiers.
>>
> Ok, I instrumented the pebs_enable() function and I confirm that
> DS_AREA=0 on resume.
>
> So what seems broken here for me is that on suspend, the cpu notifier
> ends up calling fini_debug_store() to clear DS_AREA for CPU0. But
> on resume, the same notifier does NOT call the init_debug_store().
> I don't understand this asymmetry. You either do neither or you do
> both.
>
Ok, corrections. I ran some more tests. On the suspend path, the cpu
notifier is not called for CPU0. However when the machine comes back
up, DS_AREA is 0 which is the power-up default value. And given that
the notifier is not called for CPU0, that is the value we inherit later on
and which causes the crash.

>
>> And that's sufficient for CPU hotplug, which is what suspend/resume
>> ends up doing for all but the boot CPU. But the boot CPU is not
>> hotplugged.
>>
>> Using CPU_UP notifiers is wrong, and they get called too late anyway.
>>
>> The code should use a real resume method. Or, better yet, just do it
>> right, and do it from __restore_processor_state().
>>
I will produce a patch to use this function, it's simple enough.

>> Those f*cking CPU notifiers are a pain in the ass, and the tend to be
>> invariably broken, and they have their own idiotic hacks that are
>> equally broken (ie that x86_pmu_notifier() thing seems to make up its
>> own suspend/resume with
>> "x86_pmu.cpu_prepare/cpu_starting/cpu_dying/cpu_dead" things.
>>
>> I guess we could make the BP do a fake cpu notifier thing around the
>> suspend of the boot processor as well, but most of the per-CPU stuff
>> seems to be perfectly fine without it (ie mtrr, apic, etc etc all use
>> the suspend/resume infrastructure) and doesn't need that kind of
>> stuff.
>>
>> Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/