Re: [PATCH 3/4] sched/deadline: Tracepoints for deadline scheduler

From: Daniel Bristot de Oliveira
Date: Tue Feb 23 2016 - 11:19:27 EST




On 02/23/2016 07:44 AM, Peter Zijlstra wrote:
>>> Worse, the proposed tracepoints are atrocious, look at crap like this:
>>> > >
>>>> > > > + if (trace_sched_deadline_yield_enabled()) {
>>>> > > > + u64 delta_exec = rq_clock_task(rq) - p->se.exec_start;
>>>> > > > + /* Subtract the last run till now */
>>>> > > > + if (likely((s64)delta_exec > 0))
>>>> > > > + p->dl.runtime -= delta_exec;
>>>> > > > + trace_sched_deadline_yield(&p->dl);
>>>> > > > + }
>>> > >
>>> > > tracepoints should _NEVER_ change state, ever.
>> >
>> > Heh, it's not really changing state. The code directly after this is:
>> >
>> > p->dl.runtime = 0;
> Yes, it more or less 'works', but its still atrocious shite. Its the
> worst kind of anti pattern possible.
>
> Suppose someone comes and removes that line, and ignores the tracepoint
> stuff, because, hell its a tracepoint, those don't modify stuff.
>
> Its just really, utterly bad practice.

It is possible to clean up this by removing the "p->dl.runtime = 0;"
from yield_task_dl(), to carry the dl_runtime until update_curr_dl().
We end up not proposing this change because we thought it could be
too much intrusive. But seems that it was the correct approach.

Btw, your patch for "Always calculate end of period on sched_yield()"
already do the necessary clean up, so in the possible next version of
these tracepoint the hack will disappear.

>>> > > So tell me why these specific tracepoints and why the existing ones
>>> > > could not be extended to include this information. For example, why a
>>> > > trace_sched_dealine_yield, and not a generic trace_sched_yield() that
>>> > > works for all classes.
>> >
>> > But what about reporting actual runtime, and when the next period will
>> > come. That only matters for deadline.
> How is that an answer to the question? Are you implying a generic
> trace_sched_yield() call could not do this?

I agree that a trace_sched_yield() could partially do this. But each
scheduler would have its own specific data to be printed, and it is hard
to define how many and the type of these data.

>>> > > But do not present me with a bunch of random arse, hacked together
>>> > > tracepoints and tell me they might be useful, maybe.
>> >
>> >
>> > They ARE useful. These are the tracepoints I'm currently using to
>> > debug the deadline scheduler with. They have been indispensable for my
>> > current work.
> They are, most obviously, a hacked together debug session for sure. This
> is _NOT_ what you commit.
>
> Now ideally we'd do something like the below, but because trainwreck, we
> cannot actually do this I think :-(

I do liked this patch, but I agree that bad things can happen on
applications that already use sched:sched_switch :-(.

> I really dislike tracepoints, and I'm >.< close to proposing a patch
> removing them all from the scheduler.

Scheduler tracepoints become such a standard not only for debugging the
kernel, but also for understanding performance issues and finding
optimizations for user-space tasks, e.g. finding the best way to spread
task among processors on a large NUMA box. Many people would cry
if sched tracepoints disappears :'-(.

> @@ -132,33 +177,63 @@ TRACE_EVENT(sched_switch,
> TP_STRUCT__entry(
> __array( char, prev_comm, TASK_COMM_LEN )
> __field( pid_t, prev_pid )
> - __field( int, prev_prio )
> + __field( int, prev_policy )
> + __field( s64, prev_f1 )
> + __field( u64, prev_f2 )
> + __field( u64, prev_f3 )

This helps me to explain why did I chose to create deadline specific
tracepoints.

It is not possible to define in advance a common set of parameters to be
traced/printed for all future (exotic) scheduler, because each scheduler
will use its own set of parameters. Hence, the number and type of the
parameter can vary. How many fields do we need? and which are the types
of these fields? What if a scheduler needs two s64 fields and no u64?

Moreover, it is not possible to define the changes that will be needed
on classical schedulers algorithms for them to fit on Linux's
abstractions and restrictions. Therefore, it is not safe to mention
that LLF would (theoretically) use the same set of parameters of a EDF
scheduler, because the theoretical LLF would need to be modified
to fit on Linux. For example, there is no throttling concept on the
classical deadline scheduler. So, the proposed tracepoints are only valid
for Linux's sched deadline scheduler (deadline scheduler + CBS).

That is why I believe that it would be a better approach to have
scheduler specific tracepoints for the sched_*_entity particularities
and sched_* specific points interest.

Finally, I was based in the same approach used on the fair scheduler's
sched:sched_stat* tracepoints: they report sched_entity data on fair.c,
and my patches report sched_deadline_entity data on deadline.c.