Re: [RFC PATCH V2] sched: Improve scalability of select_idle_sibling using SMT balance

From: Subhra Mazumdar
Date: Wed Jan 10 2018 - 21:09:08 EST




On 01/09/2018 06:50 AM, Steven Sistare wrote:
On 1/8/2018 5:18 PM, Peter Zijlstra wrote:
On Mon, Jan 08, 2018 at 02:12:37PM -0800, subhra mazumdar wrote:
@@ -2751,6 +2763,31 @@ context_switch(struct rq *rq, struct task_struct *prev,
struct task_struct *next, struct rq_flags *rf)
{
struct mm_struct *mm, *oldmm;
+ int this_cpu = rq->cpu;
+ struct sched_domain *sd;
+ int prev_busy, next_busy;
+
+ if (rq->curr_util == UTIL_UNINITIALIZED)
+ prev_busy = 0;
+ else
+ prev_busy = (prev != rq->idle);
+ next_busy = (next != rq->idle);
+
+ /*
+ * From sd_llc downward update the SMT utilization.
+ * Skip the lowest level 0.
+ */
+ sd = rcu_dereference_sched(per_cpu(sd_llc, this_cpu));
+ if (next_busy != prev_busy) {
+ for_each_lower_domain(sd) {
+ if (sd->level == 0)
+ break;
+ sd_context_switch(sd, rq, next_busy - prev_busy);
+ }
+ }
+
No, we're not going to be adding atomic ops here. We've been arguing
over adding a single memory barrier to this path, atomic are just not
going to happen.

Also this is entirely the wrong way to do this, we already have code
paths that _know_ if they're going into or coming out of idle.
Yes, it would be more efficient to adjust the busy-cpu count of each level
of the hierarchy in pick_next_task_idle and put_prev_task_idle.
OK, I have moved it to pick_next_task_idle/put_prev_task_idle. Will send out the v3.

Thanks,
Subhra

- Steve