Re: [RFC PATCH] sched/fair: Choose the CPU where short task is running during wake up

From: Honglei Wang
Date: Thu Sep 29 2022 - 20:59:02 EST


Hi Prateek,


On 2022/9/30 01:34, K Prateek Nayak wrote:
Hello Honglei,

Thank you for looking into this.

On 9/29/2022 12:29 PM, Honglei Wang wrote:

[..snip..]

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 914096c5b1ae..7519ab5b911c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6020,6 +6020,19 @@ static int wake_wide(struct task_struct *p)
      return 1;
  }
  +/*
+ * If a task switches in and then voluntarily relinquishes the
+ * CPU quickly, it is regarded as a short running task.
+ * sysctl_sched_min_granularity is chosen as the threshold,
+ * as this value is the minimal slice if there are too many
+ * runnable tasks, see __sched_period().
+ */
+static int is_short_task(struct task_struct *p)
+{
+    return (p->se.sum_exec_runtime <=
+        (p->nvcsw * sysctl_sched_min_granularity));
+}
+
  /*
   * The purpose of wake_affine() is to quickly determine on which CPU we can run
   * soonest. For the purpose of speed we only consider the waking and previous
@@ -6050,7 +6063,8 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync)
      if (available_idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu))
          return available_idle_cpu(prev_cpu) ? prev_cpu : this_cpu;
  -    if (sync && cpu_rq(this_cpu)->nr_running == 1)
+    if ((sync && cpu_rq(this_cpu)->nr_running == 1) ||
+        is_short_task(cpu_curr(this_cpu)))

Seems it a bit breaks idle (or will be idle) purpose of wake_affine_idle() here. Maybe we can do it something like this?

if ((sync || is_short_task(cpu_curr(this_cpu))) && cpu_rq(this_cpu)->nr_running == 1)

I believe this will still cause performance degradation on split-LLC
system for Stream like workloads. Based on the logs below, we can
have a situation as follows:

stream-4135 [029] d..2. 353.580957: select_task_rq_fair: wake_affine_idle: Select this_cpu: sync(0) rq->nr_running(1) is_short_task(1)

Where sync is 0 but is_short_task() may return 1 and the
current_rq->nr_running is 1. This will lead to two Stream threads
getting placed on same LLC during wakeup which will cause cache
contention and performance degradation.


What I meant was that we should not break the purpose of wake_affine_idle(). 'nr_running == 1' makes sure there won't be a long queue here, and this might be helpful in the benchmark tests as well. Probably the short code section I sent was not considerate.. It's just kinda clue.

I see your test result in another mail. It's great and is exactly what I was thinking we should test.

Thanks,
Honglei


Thanks,
Honglei


This change seems to optimize for affine wakeup which benefits
tasks with producer-consumer pattern but is not ideal for Stream.
Currently the logic ends will do an affine wakeup even if sync
flag is not set:

           stream-4135    [029] d..2.   353.580953: sched_waking: comm=stream pid=4129 prio=120 target_cpu=082
           stream-4135    [029] d..2.   353.580957: select_task_rq_fair: wake_affine_idle: Select this_cpu: sync(0) rq->nr_running(1) is_short_task(1)
           stream-4135    [029] d..2.   353.580960: sched_migrate_task: comm=stream pid=4129 prio=120 orig_cpu=82 dest_cpu=30
           <idle>-0       [030] dNh2.   353.580993: sched_wakeup: comm=stream pid=4129 prio=120 target_cpu=030

This is the exact situation observed during our testing.


[..snip..]
--
Thanks and Regards,
Prateek