Re: [RFC PATCH 3/3] sched: Implement shared wakequeue in CFS

From: David Vernet
Date: Wed Jun 21 2023 - 21:43:41 EST


On Wed, Jun 21, 2023 at 01:47:00PM +0530, Gautham R. Shenoy wrote:
> Hello David,
> On Tue, Jun 20, 2023 at 03:08:22PM -0500, David Vernet wrote:
> > On Mon, Jun 19, 2023 at 11:43:13AM +0530, Gautham R. Shenoy wrote:
> > > Hello David,
> > >
> > >
> > > On Tue, Jun 13, 2023 at 12:20:04AM -0500, David Vernet wrote:
> > > [..snip..]
> > >
> > > > +static void swqueue_enqueue(struct rq *rq, struct task_struct *p, int enq_flags)
> > > > +{
> > > > + unsigned long flags;
> > > > + struct swqueue *swqueue;
> > > > + bool task_migrated = enq_flags & ENQUEUE_MIGRATED;
> > > > + bool task_wakeup = enq_flags & ENQUEUE_WAKEUP;
> > > > +
> > > > + /*
> > > > + * Only enqueue the task in the shared wakequeue if:
> > > > + *
> > > > + * - SWQUEUE is enabled
> > > > + * - The task is on the wakeup path
> > > > + * - The task wasn't purposefully migrated to the current rq by
> > > > + * select_task_rq()
> > > > + * - The task isn't pinned to a specific CPU
> > > > + */
> > > > + if (!task_wakeup || task_migrated || p->nr_cpus_allowed == 1)
> > > > + return;
> > >
> > > In select_task_rq_fair(), having determined if the target of task
> > > wakeup should be the task's previous CPU vs the waker's current CPU,
> > > we spend quite a bit of time already to determine if there is an idle
> > > core/CPU in the target's LLC. @rq would correspond to CPU chosen as a
> > > result of that scan or if no idle CPU exists, @rq corresponds to the
> > > target CPU determined by wake_affine_idle()/wake_affine_weight().
> > >
> > > So if the CPU of @rq is idle here, can we not simply return here?
> >
> > Hi Gautum,
> >
> > Sorry, I'm not sure I'm quite following the issue you're pointing out.
> > We don't use swqueue if the task was migrated following
> > select_task_rq_fair(). That's the idea with us returning if the task was
> > migrated (the second conditional in that if). If I messed up that logic
> > somehow, it should be fixed.
>
> Sorry, my bad. I see it now.
>
> So as per this patch, the only time we enqueue the task on the shared
> wakeup is if the target of try_to_wake_up() is the same CPU where the
> task ran previously.
>
> When wake_affine logic fails and the previous CPU is chosen as the
> target, and when there are no other idle cores/threads in the LLC of
> the previous CPU, it makes sense to queue the task on the
> shared-wakequeue instead of on a busy previous CPU.
>
> And when that previous CPU is idle, the try_to_wake_up() would have
> woken it up via ttwu_queue(), so before going idle the next time it
> will check the shared queue for the task and find it. We should be
> good in this case.
>
> Now, it is possible that select_task_rq_fair() ended up selecting the
> waker's CPU as the target based on the
> wake_affine_idle()/wake_affine_weight() logic. And if there is no idle
> core/thread on the waker's LLC, the target would be the busy waker
> CPU. In the case when the waker CPU is different from the task's
> previous CPU, due to ENQUEUE_MIGRATE flag being set, the task won't be
> queued on the shared wakequeue and instead has to wait on the busy
> waker CPU.
>
> I wonder if it makes sense to enqueue the task on the shared wakequeue
> in this scenario as well.

Hello Gautham,

That's a good point. My original intention with opting out of using
swqueue if select_task_rq_fair() caused us to migrate is so that it
wouldn't interfere with decisions made with other select_task_rq_fair()
heuristics like wake_affine_*(). Basically just minimizing the possible
impact of swqueue. That said, I think it probably does make sense to
just enqueue in the swqueue regardless of whether ENQUEUE_MIGRATED is
set. One of the main goals of swqueue is work conservation, and in
hindsight it does feel somewhat artificial to add a heuristic that works
against that.

I'd like to hear what others think. In my opinion it's worth at least
running some tests on workloads that heavily utilize the CPU such as
kernel compile, and seeing what the outcomes are.

Thanks,
David