Re: [PATCH v2] sched/task_group: Re-layout structure to reduce false sharing

From: Peter Zijlstra
Date: Tue Jun 27 2023 - 06:12:14 EST


On Mon, Jun 26, 2023 at 08:53:35PM +0800, Aaron Lu wrote:
> On Mon, Jun 26, 2023 at 03:52:17PM +0800, Chen Yu wrote:
> > Besides the cache line alignment, if the task is not a rt one,
> > why do we have to touch that, I wonder if the following change can avoid that:
> >
> > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > index ec7b3e0a2b20..067f1310bad2 100644
> > --- a/kernel/sched/sched.h
> > +++ b/kernel/sched/sched.h
> > @@ -1958,8 +1958,10 @@ static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
> > #endif
> >
> > #ifdef CONFIG_RT_GROUP_SCHED
> > - p->rt.rt_rq = tg->rt_rq[cpu];
> > - p->rt.parent = tg->rt_se[cpu];
> > + if (p->sched_class = &rt_sched_class) {
> == :-)
>
> > + p->rt.rt_rq = tg->rt_rq[cpu];
> > + p->rt.parent = tg->rt_se[cpu];
> > + }
> > #endif
> > }
>
> If a task starts life as a SCHED_NORMAL one and then after some time
> it's changed to a RT one, then during its next ttwu(), if it didn't
> migrate, then set_task_rq() will not be called and p->rt.rt_rq will
> keep as NULL which will cause problem when this task gets enqueued as
> a rt one.
>
> The follow diff seems to cure this issue:
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index c7db597e8175..8c57148e668c 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7801,6 +7801,20 @@ static int __sched_setscheduler(struct task_struct *p,
> }
> __setscheduler_uclamp(p, attr);
>
> +#ifdef CONFIG_RT_GROUP_SCHED
> + /*
> + * Make sure when this task becomes a rt one,
> + * its rt fields have valid value.
> + */
> + if (rt_prio(newprio)) {
> + struct task_group *tg = task_group(p);
> + int cpu = cpu_of(rq);
> +
> + p->rt.rt_rq = tg->rt_rq[cpu];
> + p->rt.parent = tg->rt_se[cpu];
> + }
> +#endif
> +
> if (queued) {
> /*
> * We enqueue to tail when the priority of a task is
>
> But I'm not sure if it's worth the trouble.

Not sufficient, you can become RT through PI and not pass
__sched_setscheduler().

The common code-path in this case would be check_class_changed(), that's
called for oth PI and __sched_setscheduler().

Anyway, not against this per-se, but RT_GROUP_SCHED is utter shite and
nobody should be using it. Also, if there's no measurable performance
gain (as stated elsewhere IIRC) we shouldn't be adding complexity.