Re: [PATCH v2 2/3] sched/fair: Calculate the cache-hot time of the idle CPU

From: Chen Yu
Date: Sun Nov 26 2023 - 02:22:10 EST


Hi Madadi,

On 2023-11-25 at 12:40:18 +0530, Madadi Vineeth Reddy wrote:
> Hi Chen Yu,
>
> On 21/11/23 13:09, Chen Yu wrote:
> > When a CPU is about to become idle due to task dequeue, uses
> > the dequeued task's average sleep time to set the cache
> > hot timeout of this idle CPU. This information can facilitate
> > SIS to skip the cache-hot idle CPU and scan for the next
> > cache-cold one. When that task is woken up again, it can choose
> > its previous CPU and reuses its hot-cache.
> >
> > This is a preparation for the next patch to introduce SIS_CACHE
> > based task wakeup.
> >
> > Signed-off-by: Chen Yu <yu.c.chen@xxxxxxxxx>
> > ---
> > kernel/sched/fair.c | 30 +++++++++++++++++++++++++++++-
> > kernel/sched/features.h | 1 +
> > kernel/sched/sched.h | 1 +
> > 3 files changed, 31 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 672616503e35..c309b3d203c0 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6853,8 +6853,17 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > util_est_update(&rq->cfs, p, task_sleep);
> >
> > if (task_sleep) {
> > - p->last_dequeue_time = sched_clock_cpu(cpu_of(rq));
> > + u64 now = sched_clock_cpu(cpu_of(rq));
> > +
> > + p->last_dequeue_time = now;
> > p->last_dequeue_cpu = cpu_of(rq);
> > +
> > +#ifdef CONFIG_SMP
> > + /* this rq becomes idle, update its cache hot timeout */
> > + if (sched_feat(SIS_CACHE) && !rq->nr_running &&
> > + p->avg_hot_dur)
> > + rq->cache_hot_timeout = max(rq->cache_hot_timeout, now + p->avg_hot_dur);
>
> As per the discussion in the rfc patch, you mentioned that SIS_CACHE only honors the average sleep time
> of the latest dequeued task and that we don't know how much of the cache is polluted by the latest task.
>
> So I was wondering what made you to put max here.
>

Thanks for taking a look. Yes, previously SIS_CACHE only honors the latest dequeue task.
But as Mathieu pointed out[1], the latest dequeue task might not have enough time to scribble
the cache footprint of some older dequeue tasks, and we should honor the sleep time of
those older dequeue tasks. Consider the following scenario:

task p1 is dequeued with an average sleep time of 2 msec. Then p2 is scheduled in
on this cpu, but only runs for 10 us(does not pollute the cache footprint) and
dequeued with average sleep time of 1 msec. Should we tag the CPU runqueue's timeout
as 2 msec or 1 msec later? We choose 2 msec. The idea is to make the timeout moving
forward so the SIS_CACHE could make it easier for the p1 to be woken up on its previous
CPU.

[1] https://lore.kernel.org/lkml/2a47ae82-b8cd-95db-9f48-82b3df0730f3@xxxxxxxxxxxx/

thanks,
Chenyu