Re: [PATCH v6 2/4] sched/fair: Check a task has a fitting cpu when updating misfit

From: Qais Yousef
Date: Thu Mar 07 2024 - 05:36:05 EST


On 03/07/24 10:14, Vincent Guittot wrote:
> On Wed, 6 Mar 2024 at 22:47, Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
> >
> > On 03/03/24 17:44, Qais Yousef wrote:
> >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 174687252e1a..b0e60a565945 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -8260,6 +8260,8 @@ static void set_task_max_allowed_capacity(struct task_struct *p)
> > > cpumask_t *cpumask;
> > >
> > > cpumask = cpu_capacity_span(entry);
> > > + if (!cpumask_intersects(cpu_active_mask, cpumask))
> > > + continue;
> > > if (!cpumask_intersects(p->cpus_ptr, cpumask))
> > > continue;
> > >
> > > @@ -8269,6 +8271,53 @@ static void set_task_max_allowed_capacity(struct task_struct *p)
> > > rcu_read_unlock();
> > > }
> > >
> > > +static void __update_tasks_max_allowed_capacity(unsigned long capacity)
> > > +{
> > > + struct task_struct *g, *p;
> > > +
> > > + for_each_process_thread(g, p) {
> > > + if (fair_policy(p->policy) && p->max_allowed_capacity == capacity)
> >
> > This condition actually not good enough. We need to differentiate between going
> > online/offline. I didn't want to call set_task_max_allowed_capacity()
> > unconditionally and make hotplug even slower.
>
> But should we even try to fix this ? hotplugging a cpu is a special
> case and with patch 4 you will not increase lb_interval anymore

I don't care to be honest and this was my first reaction, but I couldn't ignore
the report.

I will need to do something to handle the dynamic EM changing capacities anyway
after 6.9 merge window. Or maybe now; I still haven't thought about it. I am
hoping I can trigger the update somewhere from the topology code. Maybe that
work will make handling hotplug easier than the approach I've taken now on
rq_online/offline.

FWIW, I have a working patch that solves the problem. The only drawback is that
rq_online/offline() are not only called from sched_cpu_activate/deactivate()
path but from build_sched_domains() path which for some reasons ends up calling
rq_offline/online() for each cpu in the map. To be even more efficient I need
to teach rq_offline/online() to differentiate between the two. Or refactor the
code. Which if you don't think it's important too I'd be happy to drop this and
replace it with a comment to see if someone cares. Only testing and dev could
end up using hotplug; so there could be a difference in behavior in regards how
often misfit migration can kick. But as you said hopefully with these fixes
it'd just end up being unnecessary work. The only potential problem maybe is
that misfit lb has a precedence over other types of lb types; so we could
end up delaying load imbalance if there's a pointless misfit lb?

I'm happy to follow the crowd. But it'd be nice if this series can be made
mergeable with follow up work. It'd make life much easier for me.


Thanks!

--
Qais Yousef