Hello Honglei,
Thank you for looking into this.
On 9/29/2022 12:29 PM, Honglei Wang wrote:
[..snip..]
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 914096c5b1ae..7519ab5b911c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6020,6 +6020,19 @@ static int wake_wide(struct task_struct *p)
return 1;
}
+/*
+ * If a task switches in and then voluntarily relinquishes the
+ * CPU quickly, it is regarded as a short running task.
+ * sysctl_sched_min_granularity is chosen as the threshold,
+ * as this value is the minimal slice if there are too many
+ * runnable tasks, see __sched_period().
+ */
+static int is_short_task(struct task_struct *p)
+{
+ return (p->se.sum_exec_runtime <=
+ (p->nvcsw * sysctl_sched_min_granularity));
+}
+
/*
* The purpose of wake_affine() is to quickly determine on which CPU we can run
* soonest. For the purpose of speed we only consider the waking and previous
@@ -6050,7 +6063,8 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync)
if (available_idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu))
return available_idle_cpu(prev_cpu) ? prev_cpu : this_cpu;
- if (sync && cpu_rq(this_cpu)->nr_running == 1)
+ if ((sync && cpu_rq(this_cpu)->nr_running == 1) ||
+ is_short_task(cpu_curr(this_cpu)))
Seems it a bit breaks idle (or will be idle) purpose of wake_affine_idle() here. Maybe we can do it something like this?
if ((sync || is_short_task(cpu_curr(this_cpu))) && cpu_rq(this_cpu)->nr_running == 1)
I believe this will still cause performance degradation on split-LLC
system for Stream like workloads. Based on the logs below, we can
have a situation as follows:
stream-4135 [029] d..2. 353.580957: select_task_rq_fair: wake_affine_idle: Select this_cpu: sync(0) rq->nr_running(1) is_short_task(1)
Where sync is 0 but is_short_task() may return 1 and the
current_rq->nr_running is 1. This will lead to two Stream threads
getting placed on same LLC during wakeup which will cause cache
contention and performance degradation.
Thanks,
Honglei
This change seems to optimize for affine wakeup which benefits
tasks with producer-consumer pattern but is not ideal for Stream.
Currently the logic ends will do an affine wakeup even if sync
flag is not set:
stream-4135 [029] d..2. 353.580953: sched_waking: comm=stream pid=4129 prio=120 target_cpu=082
stream-4135 [029] d..2. 353.580957: select_task_rq_fair: wake_affine_idle: Select this_cpu: sync(0) rq->nr_running(1) is_short_task(1)
stream-4135 [029] d..2. 353.580960: sched_migrate_task: comm=stream pid=4129 prio=120 orig_cpu=82 dest_cpu=30
<idle>-0 [030] dNh2. 353.580993: sched_wakeup: comm=stream pid=4129 prio=120 target_cpu=030
This is the exact situation observed during our testing.
--
[..snip..]
Thanks and Regards,
Prateek