Re: [tip: sched/core] sched/fair: Multi-LLC select_idle_sibling()

From: Peter Zijlstra
Date: Thu Jun 01 2023 - 04:09:35 EST


On Thu, Jun 01, 2023 at 11:41:14AM +0800, Abel Wu wrote:
> On 5/31/23 8:04 PM, tip-bot2 for Peter Zijlstra wrote:
> > The following commit has been merged into the sched/core branch of tip:
> >
> > Commit-ID: c7dfd6b9122d29d0e9a4587ab470c0564d7f92ab
> > Gitweb: https://git.kernel.org/tip/c7dfd6b9122d29d0e9a4587ab470c0564d7f92ab
> > Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > AuthorDate: Tue, 30 May 2023 13:20:46 +02:00
> > Committer: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > CommitterDate: Tue, 30 May 2023 22:46:27 +02:00
> >
> > sched/fair: Multi-LLC select_idle_sibling()
> >
> > Tejun reported that when he targets workqueues towards a specific LLC
> > on his Zen2 machine with 3 cores / LLC and 4 LLCs in total, he gets
> > significant idle time.
> >
> > This is, of course, because of how select_idle_sibling() will not
> > consider anything outside of the local LLC, and since all these tasks
> > are short running the periodic idle load balancer is ineffective.
> >
> > And while it is good to keep work cache local, it is better to not
> > have significant idle time. Therefore, have select_idle_sibling() try
> > other LLCs inside the same node when the local one comes up empty.
> >
> > Reported-by: Tejun Heo <tj@xxxxxxxxxx>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> > ---
> > kernel/sched/fair.c | 38 ++++++++++++++++++++++++++++++++++++++
> > kernel/sched/features.h | 1 +
> > 2 files changed, 39 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 48b6f0c..0172458 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7028,6 +7028,38 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> > }
> > /*
> > + * For the multiple-LLC per node case, make sure to try the other LLC's if the
> > + * local LLC comes up empty.
> > + */
> > +static int
> > +select_idle_node(struct task_struct *p, struct sched_domain *sd, int target)
> > +{
> > + struct sched_domain *parent = sd->parent;
> > + struct sched_group *sg;
> > +
> > + /* Make sure to not cross nodes. */
> > + if (!parent || parent->flags & SD_NUMA)
> > + return -1;
> > +
> > + sg = parent->groups;
> > + do {
> > + int cpu = cpumask_first(sched_group_span(sg));
> > + struct sched_domain *sd_child;
> > +
> > + sd_child = per_cpu(sd_llc, cpu);
> > + if (sd_child != sd) {
>
> Since sd_llc is cpu private, I think it should be:
>
> if (!cpus_share_cache(cpu, target))

Hmm, yes.. either that or ensure sd is from the first cpu in it's mask.
Let me go fix that.

Thanks!