[PATCH v4 3/3] sched: optimize migration by forcing rmb() and updating to be called once

From: byungchul.park
Date: Fri Oct 23 2015 - 12:17:32 EST


From: Byungchul Park <byungchul.park@xxxxxxx>

the "sched/fair: make it possible to account fair load avg consistently"
patch makes rmb() and updating last_update_time called twice when doing a
migration, which can be negative at performance. actually we can optimize
it by omiting the updating part of remove_entity_load_avg().

we can optimize it by changing the order of migrate_task_rq() and
and __set_task_cpu(), and removing the update part of
remove_entity_load_avg(). by this, we can ensure updating was already
done when the migrate_task_rq() is called.

this patch also changes the migrate_task_rq()'s second argument from
new_cpu to prev_cpu because the migrate_task_rq() is changed to be called
after setting rq. additionally, this patch changes comment properly.

Signed-off-by: Byungchul Park <byungchul.park@xxxxxxx>
---
kernel/sched/core.c | 9 +++++----
kernel/sched/fair.c | 27 ++++++++++++++-------------
2 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0368054..99b05d4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1264,6 +1264,8 @@ EXPORT_SYMBOL_GPL(set_cpus_allowed_ptr);

void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
{
+ unsigned int prev_cpu = task_cpu(p);
+
#ifdef CONFIG_SCHED_DEBUG
/*
* We should never call set_task_cpu() on a blocked task,
@@ -1289,15 +1291,14 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
#endif

trace_sched_migrate_task(p, new_cpu);
+ __set_task_cpu(p, new_cpu);

- if (task_cpu(p) != new_cpu) {
+ if (prev_cpu != new_cpu) {
if (p->sched_class->migrate_task_rq)
- p->sched_class->migrate_task_rq(p, new_cpu);
+ p->sched_class->migrate_task_rq(p, prev_cpu);
p->se.nr_migrations++;
perf_event_task_migrate(p);
}
-
- __set_task_cpu(p, new_cpu);
}

static void __migrate_swap_task(struct task_struct *p, int cpu)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 522aa07..c9caf83 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5060,21 +5060,22 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
}

/*
- * Called immediately before a task is migrated to a new cpu; task_cpu(p) and
- * cfs_rq_of(p) references at time of call are still valid and identify the
- * previous cpu. However, the caller only guarantees p->pi_lock is held; no
- * other assumptions, including the state of rq->lock, should be made.
+ * Called immediately after a task is migrated to a new cpu; task_cpu(p) and
+ * cfs_rq_of(p) references at time of call identify the next cpu. However,
+ * the caller only guarantees p->pi_lock is held; no other assumptions,
+ * including the state of rq->lock, should be made.
*/
-static void migrate_task_rq_fair(struct task_struct *p, int next_cpu)
+static void migrate_task_rq_fair(struct task_struct *p, int prev_cpu)
{
- /*
- * We are supposed to update the task to "current" time, then its up to date
- * and ready to go to new CPU/cfs_rq. But we have difficulty in getting
- * what current time is, so simply throw away the out-of-date time. This
- * will result in the wakee task is less decayed, but giving the wakee more
- * load sounds not bad.
- */
- remove_entity_load_avg(&p->se, cfs_rq_of(&p->se));
+#ifdef CONFIG_CGROUP_SCHED
+ struct cfs_rq *cfs_rq = task_group(p)->cfs_rq[prev_cpu];
+#else
+ struct cfs_rq *cfs_rq = cpu_rq(prev_cpu)->cfs;
+#endif
+
+ if (!sched_feat(ATTACH_AGE_LOAD))
+ __update_entity_load_avg(&p->se, cfs_rq);
+ __remove_entity_load_avg(&p->se, cfs_rq);

/* Tell new CPU we are migrated */
p->se.avg.last_update_time = 0;
--
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/