Re: [RFC PATCH v3 2/3] sched: Introduce cpus_share_l2c

From: Aaron Lu
Date: Mon Aug 28 2023 - 07:20:58 EST


On Fri, Aug 25, 2023 at 09:51:19AM -0400, Mathieu Desnoyers wrote:
> On 8/25/23 02:49, Aaron Lu wrote:
> > On Thu, Aug 24, 2023 at 10:40:45AM -0400, Mathieu Desnoyers wrote:
> [...]
> > > > - task migrations dropped with this series for nr_group=20 and 32
> > > > according to 'perf stat'. migration number didn't drop for nr_group=10
> > > > but the two update functions' cost dropped which means fewer access to
> > > > tg->load_avg and thus, fewer task migrations. This is contradictory
> > > > and I can not explain yet;
> > >
> > > Neither can I.
> > >
>
> [...]
>
> > >
> > > > It's not clear to me why this series can reduce task migrations. I doubt
> > > > it has something to do with more wakelist style wakeup becasue for this
> > > > test machine, only a single core with two SMT threads share L2 so more
> > > > wakeups are through wakelist. In wakelist style wakeup, the target rq's
> > > > ttwu_pending is set and that will make the target cpu as !idle_cpu();
> > > > This is faster than grabbing the target rq's lock and then increase
> > > > target rq's nr_running or set target rq's curr to something else than
> > > > idle. So wakelist style wakeup can make target cpu appear as non idle
> > > > faster, but I can't connect this with reduced migration yet, I just feel
> > > > this might be the reason why task migration reduced.
> > >
> >
> [...]
> > > I've tried adding checks for rq->ttwu_pending in those code paths on top of
> > > my patch and I'm still observing the reduction in number of migrations, so
> > > it's unclear to me how doing more queued wakeups can reduce migrations the
> > > way it does.
> >
> > An interesting puzzle.
>
> One metric that can help understand the impact of my patch: comparing
> hackbench from a baseline where only your load_avg patch is applied
> to a kernel with my l2c patch applied, I notice that the goidle
> schedstat is cut in half. For a given CPU (they are pretty much alike),
> it goes from 650456 to 353487.
>
> So could it be that by doing queued wakeups, we end up batching
> execution of the woken up tasks for a given CPU, rather than going
> back and forth between idle and non-idle ? One important thing that
> this changes is to reduce the number of newidle balance triggered.

I noticed the majority(>99%) migrations are from wakeup path on this
Intel SPR when running hackbench: ttwu() -> set_task_cpu() ->
migrate_task_rq_fair(), so while I think it's a good finding that
newidle balance dropped, it's probably not the reason why migration
number dropped...

Thanks,
Aaron