Re: [PATCH v2 2/2] sched/fair: Scan cluster before scanning LLC in wake-up path

From: Srikar Dronamraju
Date: Fri Feb 04 2022 - 02:34:31 EST


* Barry Song <21cnbao@xxxxxxxxx> [2022-02-02 09:20:32]:

> On Tue, Feb 1, 2022 at 10:39 PM Srikar Dronamraju
> <srikar@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > * Barry Song <21cnbao@xxxxxxxxx> [2022-01-28 07:40:15]:
> >
> > > On Fri, Jan 28, 2022 at 8:13 PM Srikar Dronamraju
> > > <srikar@xxxxxxxxxxxxxxxxxx> wrote:
> > > >
> > > > * Barry Song <21cnbao@xxxxxxxxx> [2022-01-28 09:21:08]:
> > > >
> > > > > On Fri, Jan 28, 2022 at 4:41 AM Gautham R. Shenoy
> > > > > <gautham.shenoy@xxxxxxx> wrote:
> > > > > >
> > > > > > On Wed, Jan 26, 2022 at 04:09:47PM +0800, Yicong Yang wrote:
> > > > > > > From: Barry Song <song.bao.hua@xxxxxxxxxxxxx>
> > > > > > >
> > > I am sorry I didn't get your question. Currently the code works as below:
> > > if task A wakes up task B, and task A is in LLC0 and task B is in LLC1.
> > > we will scan the cluster of A before scanning the whole LLC0, in this case,
> > > cluster of A is the closest sibling, so it is the better choice than other CPUs
> > > which are in LLC0 but not in the cluster of A.
> >
> > Yes, this is right.
> >
> > > But we do scan all cpus of LLC0
> > > afterwards if we fail to find an idle CPU in the cluster.
> >
> > However my reading of the patch, before we can scan other clusters within
> > the LLC (aka LLC0), we have a check in scan cluster which says
> >
> > /* Don't ping-pong tasks in and out cluster frequently */
> > if (cpus_share_resources(target, prev_cpu))
> > return target;
> >
> > My reading of this is, ignore other clusters (at this point, we know there
> > are no idle CPUs in this cluster. We don't know if there are idle cpus in
> > them or not) if the previous CPU and target CPU happen to be from the same
> > cluster. This effectively means we are given preference to cache over idle
> > CPU.
>
> Note we only ignore other cluster while prev_cpu and target are in same
> cluster. if the condition is false, we are not ignoring other cpus. typically,
> if waker is the target, and wakee is the prev_cpu, that means if they are
> already in one cluster, we don't stupidly spread them in select_idle_cpu() path
> as benchmark shows we are losing. so, yes, we are giving preference to
> cache over CPU.

We already figured out that there are no idle CPUs in this cluster. So dont
we gain performance by picking a idle CPU/core in the neighbouring cluster.
If there are no idle CPU/core in the neighbouring cluster, then it does make
sense to fallback on the current cluster.

>
> >
> > Or Am I still missing something?
> >
> > >
> > > After a while, if the cluster of A gets an idle CPU and pulls B into the
> > > cluster, we prefer not pushing B out of the cluster of A again though
> > > there might be an idle CPU outside. as benchmark shows getting an
> > > idle CPU out of the cluster of A doesn't bring performance improvement
> > > but performance decreases as B might be getting in and getting out
> > > the cluster of A very frequently, then cache coherence ping-pong.
> > >
> >
> > The counter argument can be that Task A and Task B are related and were
> > running on the same cluster. But Load balancer moved Task B to a different
> > cluster. Now this check may cause them to continue to run on two different
> > clusters, even though the underlying load balance issues may have changed.
> >
> > No?
>
> LB is much slower than select_idle_cpu(). select_idle_cpu() can dynamically
> work afterwards. so it is always a dynamic balance and task migration.
>
> >
> >
> > --
> > Thanks and Regards
> > Srikar Dronamraju
>
> Thanks
> Barry

--
Thanks and Regards
Srikar Dronamraju