Re: [PATCH 1/2] sched: introduce helper function to calculate distribution over sched class

From: Zhaoyang Huang
Date: Wed Feb 21 2024 - 21:58:43 EST


On Thu, Feb 22, 2024 at 1:51 AM Vincent Guittot
<vincent.guittot@xxxxxxxxxx> wrote:
>
> On Tue, 20 Feb 2024 at 07:16, zhaoyang.huang <zhaoyang.huang@xxxxxxxxxx> wrote:
> >
> > From: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
> >
> > As RT, DL, IRQ time could be deemed as lost time of CFS's task, some
>
> It's lost only if cfs has been actually preempted
Yes. Actually, I just want to get the approximate proportion of how
CFS tasks(whole runq) is preempted. The preemption among CFS is not
considered.
>
> > timing value want to know the distribution of how these spread
> > approximately by using utilization account value (nivcsw is not enough
> > sometimes). This commit would like to introduce a helper function to
> > achieve this goal.
> >
> > eg.
> > Effective part of A = Total_time * cpu_util_cfs / cpu_util
> >
> > Timing value A
> > (should be a process last for several TICKs or statistics of a repeadted
> > process)
> >
> > Timing start
> > |
> > |
> > preempted by RT, DL or IRQ
> > |\
> > | This period time is nonvoluntary CPU give up, need to know how long
> > |/
>
> preempted means that a cfs task stops running on the cpu and lets
> another rt/dl task or an irq run on the cpu instead. We can't know
> that. We know an average ratio of time spent in rt/dl and irq contexts
> but not if the cpu was idle or running cfs task
ok, will take idle into consideration and as explained above,
preemption among cfs tasks is not considered on purpose
>
> > sched in again
> > |
> > |
> > |
> > Timing end
> >
> > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
> > ---
> > include/linux/sched.h | 1 +
> > kernel/sched/core.c | 20 ++++++++++++++++++++
> > 2 files changed, 21 insertions(+)
> >
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 77f01ac385f7..99cf09c47f72 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -2318,6 +2318,7 @@ static inline bool owner_on_cpu(struct task_struct *owner)
> >
> > /* Returns effective CPU energy utilization, as seen by the scheduler */
> > unsigned long sched_cpu_util(int cpu);
> > +unsigned long cfs_prop_by_util(struct task_struct *tsk, unsigned long val);
> > #endif /* CONFIG_SMP */
> >
> > #ifdef CONFIG_RSEQ
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 802551e0009b..217e2220fdc1 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -7494,6 +7494,26 @@ unsigned long sched_cpu_util(int cpu)
> > {
> > return effective_cpu_util(cpu, cpu_util_cfs(cpu), ENERGY_UTIL, NULL);
> > }
> > +
> > +/*
> > + * Calculate the approximate proportion of timing value consumed in cfs.
> > + * The user must be aware of this is done by avg_util which is tracked by
> > + * the geometric series as decaying the load by y^32 = 0.5 (unit is 1ms).
> > + * That is, only the period last for at least several TICKs or the statistics
> > + * of repeated timing value are suitable for this helper function.
> > + */
> > +unsigned long cfs_prop_by_util(struct task_struct *tsk, unsigned long val)
> > +{
> > + unsigned int cpu = task_cpu(tsk);
> > + struct rq *rq = cpu_rq(cpu);
> > + unsigned long util;
> > +
> > + if (tsk->sched_class != &fair_sched_class)
> > + return val;
> > + util = cpu_util_rt(rq) + cpu_util_cfs(cpu) + cpu_util_irq(rq) + cpu_util_dl(rq);
>
> This is not correct as irq is not on the same clock domain: look at
> effective_cpu_util()
>
> You don't care about idle time ?
ok, will check. thanks
>
> > + return min(val, cpu_util_cfs(cpu) * val / util);
> > +}
> > +
> > #endif /* CONFIG_SMP */
> >
> > /**
> > --
> > 2.25.1
> >