Re: [PATCH v2] sched: introduce sched_switch_post trace event

From: Cong Wang
Date: Wed Jul 08 2015 - 16:43:01 EST


On Mon, Jul 6, 2015 at 5:15 PM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> On Mon, 6 Jul 2015 12:15:45 -0700
> Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote:
>
>> Currently we only have one sched_switch trace event
>> for task switching, which is generated very early during
>> task switch. When we try to monitor per-container perf
>> events, this is not what we expect.
>>
>> For example, we have a process A which is in the cgroup
>> we monitor, and process B which isn't, when kernel switches
>> from B to A, the sched_switch event is not recorded for this
>> cgroup since it belongs to B (current process is still B
>> util we finish the switch), but we require this event to
>> signal that process A in this cgroup gets scheduled. This is
>> crucial for calculating schedule latency (like `perf sched`).
>
> I just want to understand this correctly. Does perf sched only listen
> to events that are executed by the task in a particular cgroup? There's
> no way to say "check sched_switch field next"?
>

perf_event cgroup needs to be specified in cmdline and `perf sched`
doesn't support that currently, I wrote my own tool to do this.
(I have some patch to add it to `perf sched`)

As I replied in the previous thread, we can certainly check if a process
belongs to a cgroup by tracking the PID's, but that is not easy.

>
> This looks identical to trace_sched_switch. Please convert both to a
> DECLARE_EVENT_CLASS() and DEFINE_EVENT()s.
>

Not identical, I rename 'next' to 'curr' since switch is done 'next'
becomes meaningless.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/