Re: [PATCH V6 0/2] tracing, perf: cpu hotplug trace events

From: Vincent Guittot
Date: Wed Mar 02 2011 - 14:02:08 EST


On 2 March 2011 11:57, Thomas Renninger <trenn@xxxxxxx> wrote:
> On Wednesday, March 02, 2011 08:56:25 AM Ingo Molnar wrote:
>>
>> * Vincent Guittot <vincent.guittot@xxxxxxxxxx> wrote:
>>
>> > This patchset adds some tracepoints for tracing cpu state and for
>> > profiling the plug and unplug sequence.
>> >
>> > Some SMP arm platform uses cpu hotplug feature for improving their
>> > power saving because they can go into their deepest idle state only in
>> > mono core mode. In addition, running into mono core mode makes the
>> > cpuidle job easier and more efficient which also results in the
>> > improvement of power saving of some use cases. As the plug state of a
>> > cpu can impact the cpuidle behavior, it's interesting to trace this
>> > state and to correlate it with cpuidle traces.
>> > Then, cpu hotplug is known to be an expensive operation which also
>> > takes a variable time depending of other processes' activity (from
>> > hundreds ms up to few seconds). These traces have shown that the arch
>> > part stays almost constant on arm platform whatever the cpu load is,
>> > whereas the plug duration increases.
>> >
>> > ---
>> >  include/trace/events/cpu_hotplug.h |  103
>> >  ++++++++++++++++++++++++++++++++++++
>> >  kernel/cpu.c                       |   18 ++++++
>> >  2 files changed, 121 insertions(+), 0 deletions(-)
>> >  create mode 100644 include/trace/events/cpu_hotplug.h
>>
>> Why not do something much simpler and fit these into the existing
>> power:* events:
>>
>>      power:cpu_idle
>>      power:cpu_frequency
>>      power:machine_suspend
>>      power:cpu_idle
>>      power:cpu_frequency
>>      power:machine_suspend
>>
>> in an intelligent way?
>>
>> CPU hotplug is really a 'soft' form of suspend and tools using power
>> events could
>> thus immediately use CPU hotplug events as well.
>>
>> A suitable new 'state' value could be used to signal CPU hotplug events:
>>
>>  enum {
>>         POWER_NONE = 0,
>>         POWER_CSTATE = 1,
>>         POWER_PSTATE = 2,
>>  };
>>
>> POWER_HSTATE for hotplug-state, or so.
> Be careful, these are obsolete!
> This information is in the name of the event itself:
> PSTATE -> CPU frequency     -> power:cpu_frequency
> CSTATE -> sleep/idle states -> power:cpu_idle
>
>> This would also express the design arguments that others have pointed
>> out in the prior discussion: that CPU hotplug is really a power
>> management variant, and that in the long run it could be done via
>> regular idle as well. When that happens, the above unified event
>> structure makes it all even simpler - analysis tools will just
>> continue to work fine.
>
> About the patch:
> You create:
> cpu_hotplug:cpu_hotplug_down_start
> cpu_hotplug:cpu_hotplug_down_end
> cpu_hotplug:cpu_hotplug_up_start
> cpu_hotplug:cpu_hotplug_up_end
> cpu_hotplug:cpu_hotplug_disable_start
> cpu_hotplug:cpu_hotplug_disable_end
> cpu_hotplug:cpu_hotplug_die_start
> cpu_hotplug:cpu_hotplug_die_end
> cpu_hotplug:cpu_hotplug_arch_up_start
> cpu_hotplug:cpu_hotplug_arch_up_end
>
> quite some events for cpu hotplugging...
> You mix up two things you want to trace:
>  1) The cpu hotplugging itself which you might want to compare
>     with system activity, other idle states, etc. and check whether
>     removing/adding CPUs works in respect of your power saving
>     algorithms
>  2) You want to trace the time __cpu_down and friends take to
>     optimize them
>
> For 1. I agree that it would be worth (mostly for arm now as long as
> it's the only arch using this as a power saving feature, but it may show
> up on other archs as well) to create an event which looks like:
>
> power:cpu_hotplug(unsigned int state, unsigned int cpu_id)
>

If it's possible to add such cpu_hotplug event in the power event
class, that's should be fine for me.

> Define a state:
> CPU_HOT_PLUG 1
> CPU_HOT_UNPLUG 2
> This would be consistent with other power:* events. One idea of having
> one event passing the state is, that it does not make sense to track an:
> power:cpu_hotunplug or power:cpu_hotplug
> standalone.
>
> Theoretically this could get enhanced with further states:
> CPU_HOT_PLUG_DISABLE_IRQS 3
> CPU_HOT_PLUG_ENABLE_IRQS  4
> CPU_HOT_PLUG_ACTIVATE     5
> CPU_HOT_PLUG_DISABLE      6
> ...
> if it should be possible at some point to only disable IRQs or to
> only disable code processing or to only disable whatever to achieve
> better power savings.
> But as long as there only is the general cpu_hotplug interface
> bringing the cpu totally up or down, above should be enough in
> respect of power saving tracings.
>
>
> For 2. you should use more appropriate tools to optimize the code
> processed in __cpu_{,up,down,enable,disable,die} functions and friends.
> If you simply need the time, system tab or kprobes might work out for you.
> There is preloadtrace.ko based on a system tab script which instruments
> functions called at boot up and measures their time.
>
> Or probably better are perf profiling facilities. It should be possible
> to profile __cpu_down and subsequent calls in detail. Like that you
> should get a good picture which functions you have to look at and
> optimize. People in CC should better be able to tell you the exact perf
> commands and parameters you are looking for.
>

I had tried to get such kind of information with function or
function_graph tracer but some functions like _cpu_down, are not
available in "available_filter_functions". Then, we don't have the
cpuid information with function trace what is not so bad on a dual
core but becomes more important on a quad cores. That's why I have
added some cpu_hotplug traces but I'm not a trace expert and I could
have missed the solution.

>
> Hm, have you tried/thought about registering an extra cpuidle state with
> long latency doing the cpu_down? For CPU 0 it could call the deepest
> "normal" sleep state, but could decide to shut other cpus down. Like that
> you might be able to get rid of some extra code (interfering with cpuidle
> driver?) and you get all the statistics, etc. for free.
>

No I haven't tried such mechanism but are you sure that we could call
cpu_down in cpuidle function ?
I'm still looking for relevant triggers for pluging/unpluging the cpu
: current cpu load and loadavg are some interesting ones.

Thanks

Vincent
>
>   Thomas
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/