Re: [RFC PATCH] sched/fair: Introduce SIS_PAIR to wakeup task on local idle core first

From: K Prateek Nayak
Date: Wed May 17 2023 - 23:31:17 EST


Hello Chenyu,

I'll do some light testing with some benchmarks and share results on the
thread but meanwhile I have a few observations with the implementation.

On 5/17/2023 10:27 PM, Chen Yu wrote:
> On 2023-05-16 at 13:51:26 +0200, Mike Galbraith wrote:
>> On Tue, 2023-05-16 at 16:41 +0800, Chen Yu wrote:
>> [..snip..]
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7d2613ab392c..572d663065e3 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7126,6 +7126,23 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> asym_fits_cpu(task_util, util_min, util_max, target))
> return target;
>
> + /*
> + * If the waker and the wakee are good friends to each other,
> + * putting them within the same SMT domain could reduce C2C
> + * overhead. But this only applies when there is no idle core
> + * available. SMT idle sibling should be prefered to wakee's
> + * previous CPU, because the latter could still have the risk of C2C
> + * overhead.
> + *
> + */
> + if (sched_feat(SIS_PAIR) && sched_smt_active() && !has_idle_core &&

"has_idle_core" is not populated at this point and will always be false
from the initialization. Should there be a:

has_idle_core = test_idle_cores(? /* Which CPU? */);
if (sched_feat(SIS_PAIR) ...) {
...
}
has_idle_core = false;

?: "has_idle_core" is currently used in select_idle_sibling() from the
perspective of the target MC. Does switching target to current core
(which may not be on the same MC) if target MC does not have an idle core
make sense in all scenarios?

> + current->last_wakee == p && p->last_wakee == current) {
> + i = select_idle_smt(p, smp_processor_id());

Also wondering if asym_fits_cpu() check is needed in some way here.
Consider a case where waker is on a weaker capacity CPU but wakee
previously ran on a stronger capacity CPU. It might be worthwhile
to wake the wakee on previous CPU if the current CPU does not fit
the task's utilization and move the pair to the CPU with larger
capacity during the next wakeup. wake_affine_weight() would select
a target based on load and capacity consideration but here we
switch the wakeup target to a thread on the current core.

Wondering if the capacity details already considered in the path?

> +
> + if ((unsigned int)i < nr_cpumask_bits)
> + return i;
> + }
> +
> /*
> * If the previous CPU is cache affine and idle, don't be stupid:
> */
> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
> index ee7f23c76bd3..86b5c4f16199 100644
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -62,6 +62,7 @@ SCHED_FEAT(TTWU_QUEUE, true)
> */
> SCHED_FEAT(SIS_PROP, false)
> SCHED_FEAT(SIS_UTIL, true)
> +SCHED_FEAT(SIS_PAIR, true)
>
> /*
> * Issue a WARN when we do multiple update_rq_clock() calls

--
Thanks and Regards,
Prateek