Re: [RFC PATCH 1/3] sched, tracing: report task state in symbolic chars instead

From: Ze Gao
Date: Tue Jul 25 2023 - 06:55:56 EST


On Tue, Jul 25, 2023 at 4:34 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Tue, Jul 25, 2023 at 03:22:52PM +0800, Ze Gao wrote:
>
> > @@ -197,13 +198,6 @@ static inline long __trace_sched_switch_state(bool preempt,
> > BUG_ON(p != current);
> > #endif /* CONFIG_SCHED_DEBUG */
> >
> > - /*
> > - * Preemption ignores task state, therefore preempted tasks are always
> > - * RUNNING (we will not have dequeued if state != RUNNING).
> > - */
> > - if (preempt)
> > - return TASK_REPORT_MAX;
> > -
> > /*
> > * task_state_index() uses fls() and returns a value from 0-8 range.
> > * Decrement it by 1 (except TASK_RUNNING state i.e 0) before using
> > @@ -212,7 +206,16 @@ static inline long __trace_sched_switch_state(bool preempt,
> > */
> > state = __task_state_index(prev_state, p->exit_state);
> >
> > - return state ? (1 << (state - 1)) : state;
> > + /*
> > + * Preemption ignores task state, therefore preempted tasks are always
> > + * RUNNING (we will not have dequeued if state != RUNNING).
> > + * Here, we use 'p' to denote this case and only for this case.
> > + */
> > + if (preempt)
> > + return 'p';
> > +
>
> I don't get this move, why compute state before this return?

Oops, I was going to ignore the PREEMP_ACTIVE in the first attempt
and changed it to 'state = 0'
which I decided to introduce 'p' to denote this after second thoughts.
Will fix it and revert this move.

> > +
> > + return task_index_to_char(state);
> > }
> > #endif /* CREATE_TRACE_POINTS */
> >
> > @@ -232,7 +235,7 @@ TRACE_EVENT(sched_switch,
> > __array( char, prev_comm, TASK_COMM_LEN )
> > __field( pid_t, prev_pid )
> > __field( int, prev_prio )
> > - __field( long, prev_state )
> > + __field( char, prev_state )
> > __array( char, next_comm, TASK_COMM_LEN )
> > __field( pid_t, next_pid )
> > __field( int, next_prio )
>
> This is a format change and will likely break a ton of programs :/

Yeah, I admit that. And I believe this kind of break happens each
time the internal
task state constant mapping is rearranged, it's of no big difference
here, since the
most renowned perf itself is still broken at this time after. And
IMHO it's time to fix
this and do things correctly. That is why I propose this and mark it as RFC.

BTW, could you help to point to any possible tools/programs that would
break other
than perf/libtraceevent, because these two are the only users I run
into so far.

Regards,
Ze