[PATCHv2 1/2] sched: introduce helper function to calculate distribution over sched class

From: zhaoyang.huang
Date: Thu Feb 22 2024 - 04:23:31 EST


From: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>

As RT, DL, IRQ time could be deemed as lost time of CFS's task, some
timing value want to know the distribution of how these spread
approximately by using utilization account value (nivcsw is not enough
sometimes). This commit would like to introduce a helper function to
achieve this goal.

eg.
Effective part of A = Total_time * cpu_util_cfs / cpu_util

Timing value A
(should be a process last for several TICKs or statistics of a repeadted
process)

Timing start
|
|
preempted by RT, DL or IRQ
|\
| This period time is nonvoluntary CPU give up, need to know how long
|/
sched in again
|
|
|
Timing end

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
---
change of v2: using two parameter to pass se_prop and rq_prop out
---
---
include/linux/sched.h | 3 +++
kernel/sched/core.c | 35 +++++++++++++++++++++++++++++++++++
2 files changed, 38 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 77f01ac385f7..d6d5914fad10 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2318,6 +2318,9 @@ static inline bool owner_on_cpu(struct task_struct *owner)

/* Returns effective CPU energy utilization, as seen by the scheduler */
unsigned long sched_cpu_util(int cpu);
+/* Returns task's and cfs_rq's proportion among whole core */
+unsigned long cfs_prop_by_util(struct task_struct *tsk, unsigned long *se_prop,
+ unsigned long *rq_prop);
#endif /* CONFIG_SMP */

#ifdef CONFIG_RSEQ
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 802551e0009b..b8c29dff5d37 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7494,6 +7494,41 @@ unsigned long sched_cpu_util(int cpu)
{
return effective_cpu_util(cpu, cpu_util_cfs(cpu), ENERGY_UTIL, NULL);
}
+
+/*
+ * Calculate the approximate proportion of timing value consumed by the specified
+ * tsk and all cfs tasks of this core.
+ * The user must be aware of this is done by avg_util which is tracked by
+ * the geometric series of decaying the load by y^32 = 0.5 (unit is 1ms).
+ * That is, only the period last for at least several TICKs or the statistics
+ * of repeated timing value are suitable for this helper function.
+ * This function is actually derived from effective_cpu_util but without
+ * limiting the util to the core's capacity.
+ * se_prop and rq_prop is valid only when return value is 1
+ */
+unsigned long cfs_prop_by_util(struct task_struct *tsk, unsigned long *se_prop,
+ unsigned long *rq_prop)
+{
+ unsigned int cpu = task_cpu(tsk);
+ struct sched_entity *se = &tsk->se;
+ struct rq *rq = cpu_rq(cpu);
+ unsigned long util, irq, max;
+
+ if (tsk->sched_class != &fair_sched_class)
+ return 0;
+
+ max = arch_scale_cpu_capacity(cpu);
+ irq = cpu_util_irq(rq);
+
+ util = cpu_util_rt(rq) + cpu_util_cfs(cpu) + cpu_util_dl(rq);
+ util = scale_irq_capacity(util, irq, max);
+ util += irq;
+
+ *se_prop = se->avg.util_avg * 100 / util;
+ *rq_prop = cpu_util_cfs(cpu) * 100 / util;
+ return 1;
+}
+
#endif /* CONFIG_SMP */

/**
--
2.25.1