Re: [PATCH 1/2] sched/fair: Fix value reported by hot tasks pulled in /proc/schedstat

From: Peter Zijlstra
Date: Mon Jun 19 2023 - 05:23:03 EST


On Wed, Jun 14, 2023 at 10:22:23AM +0000, Swapnil Sapkal wrote:
> In /proc/schedstat, lb_hot_gained reports the number hot tasks pulled
> during load balance. This value is incremented in can_migrate_task()
> if the task is migratable and hot. After incrementing the value,
> load balancer can still decide not to migrate this task leading to wrong
> accounting. Fix this by incrementing stats when hot tasks are detached.
> This issue only exits in detach_tasks() where we can decide to not
> migrate hot task even if it is migratable. However, in detach_one_task(),
> we migrate it unconditionally.
>
> Fixes: d31980846f96 ("sched: Move up affinity check to mitigate useless redoing overhead")
> Reported-by: Gautham R. Shenoy <gautham.shenoy@xxxxxxx>
> Signed-off-by: Swapnil Sapkal <swapnil.sapkal@xxxxxxx>
> ---
> kernel/sched/fair.c | 47 +++++++++++++++++++++++++++++----------------
> 1 file changed, 30 insertions(+), 17 deletions(-)

All this for just a number hardly anybody looks at :-(

Does this also work?

Please double check the order of the task_struct::sched_bitfield thing,
I've not had much wake-up juice.

---
include/linux/sched.h | 1 +
kernel/sched/fair.c | 14 ++++++++++----
2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1292d38d66cc..eba0a78ac2a9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -887,6 +887,7 @@ struct task_struct {
unsigned sched_reset_on_fork:1;
unsigned sched_contributes_to_load:1;
unsigned sched_migrated:1;
+ unsigned sched_task_hot:1;

/* Force alignment to the next boundary: */
unsigned :0;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6189d1a45635..a88577132b20 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8569,6 +8569,8 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
int tsk_cache_hot;

lockdep_assert_rq_held(env->src_rq);
+ if (p->sched_task_hot)
+ p->sched_task_hot = 0;

/*
* We do not migrate tasks that are:
@@ -8641,10 +8643,8 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)

if (tsk_cache_hot <= 0 ||
env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
- if (tsk_cache_hot == 1) {
- schedstat_inc(env->sd->lb_hot_gained[env->idle]);
- schedstat_inc(p->stats.nr_forced_migrations);
- }
+ if (tsk_cache_hot == 1)
+ p->sched_task_hot = 1;
return 1;
}

@@ -8659,6 +8659,12 @@ static void detach_task(struct task_struct *p, struct lb_env *env)
{
lockdep_assert_rq_held(env->src_rq);

+ if (p->sched_task_hot) {
+ p->sched_task_hot = 0;
+ schedstat_inc(env->sd->lb_hot_gained[env->idle]);
+ schedstat_inc(p->stats.nr_forced_migrations);
+ }
+
deactivate_task(env->src_rq, p, DEQUEUE_NOCLOCK);
set_task_cpu(p, env->dst_cpu);
}