Re: [PATCH v3 1/2] sched/uclamp: Add a new sysctl to control RT default boost value

From: Pavan Kondeti
Date: Wed Apr 29 2020 - 11:21:27 EST


Hi Qais,

On Wed, Apr 29, 2020 at 01:30:57PM +0100, Qais Yousef wrote:
> Hi Pavan
>
> On 04/29/20 17:02, Pavan Kondeti wrote:
> > Hi Qais,
> >
> > On Tue, Apr 28, 2020 at 05:41:33PM +0100, Qais Yousef wrote:
> >
> > [...]
> >
> > >
> > > +static void uclamp_sync_util_min_rt_default(struct task_struct *p)
> > > +{
> > > + struct uclamp_se *uc_se = &p->uclamp_req[UCLAMP_MIN];
> > > +
> > > + if (unlikely(rt_task(p)) && !uc_se->user_defined)
> > > + uclamp_se_set(uc_se, sysctl_sched_uclamp_util_min_rt_default, false);
> > > +}
> >
> > Unlike system default clamp values, RT default value is written to
> > p->uclamp_req[UCLAMP_MIN]. A user may not be able to set the uclamp.max to a
> > lower value than sysctl_sched_uclamp_util_min_rt_default. This is not a
> > big deal. Just sharing my observation. Is this how you expected it to work?
>
> This is how I expect it to work yes.
>
> Note that the sysctl_sched_uclamp_util_{min,max}, or 'system default clamp
> values' as you called them, are constraints. They define the range any task is
> _allowed to_ *request*.
>
> The new sysctl_sched_uclamp_util_min_rt_default is setting the default
> *request* for RT tasks. And this was done since historically RT tasks has
> always run at the highest performance point in Linux.
>
> Can you think of a scenario where setting the default *request* of uclamp.max
> for all RT tasks helps?
>

It was a hypothetical question :-) Thanks for clearly explaining the
difference between system default clamp values (constraints) vs the newly
introduced sysctl_sched_uclamp_util_min_rt_default (request for RT tasks).

> I'm thinking, the global constrain of sysctl_sched_uclamp_util_max should
> suffice. Or one can use cgroup to selectively cap some tasks.
>
> For example if you don't want any task in the system to be boosted to anything
> higher than 800, just do
>
> sysctl_sched_uclamp_util_max = 800
>
> Or if you want to be selective, the same thing can be achieved by creating
> a cgroup cpu controller that has uclamp.max = 0.8 and attaching tasks to it.
>
> >
> > > +
> > > static inline struct uclamp_se
> > > uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
> > > {
> > > @@ -907,8 +935,15 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
> > > static inline struct uclamp_se
> > > uclamp_eff_get(struct task_struct *p, enum uclamp_id clamp_id)
> > > {
> > > - struct uclamp_se uc_req = uclamp_tg_restrict(p, clamp_id);
> > > - struct uclamp_se uc_max = uclamp_default[clamp_id];
> > > + struct uclamp_se uc_req, uc_max;
> > > +
> > > + /*
> > > + * Sync up any change to sysctl_sched_uclamp_util_min_rt_default value.
> > > + */
> > > + uclamp_sync_util_min_rt_default(p);
> > > +
> > > + uc_req = uclamp_tg_restrict(p, clamp_id);
> > > + uc_max = uclamp_default[clamp_id];
> >
> > We are calling uclamp_sync_util_min_rt_default() unnecessarily for
> > clamp_id == UCLAMP_MAX case. Would it be better to have a separate
>
> It was actually intentional to make sure we update the value ASAP. I didn't
> think it's a lot of overhead. I can further protect with a check to verify
> whether the value has changed if it seems heavy handed.
>
> > uclamp_default for RT like uclamp_default_rt and select uc_max based
> > on task policy? Since all tunables are handled in sysctl_sched_uclamp_handler
> > we can cover the case of uclamp_util_min < uclamp_util_min_rt.
>
> Hmm I'm not sure I got you.
>
> uclamp_default[] is setting the constraints. I.e. what's the range it's allowed
> to take.
>
> What's added here is simply setting the requested uclamp.min value, which if
> the constraints allow it will be applied.
>
> For example. If
>
> sysctl_sched_uclamp_util_min = 0
> sysctl_sched_uclamp_util_min_rt_default = 400
>
> uclamp_eff_get() will return 0 as effective uclamp.min value for the task. So
> while the task still requests to be boosted to 1024, but the system constraint
> prevents it from getting it. But as soon as the system constraint has changed,
> the task might be able to get what it wants.
>
> For example, if at runtime sysctl_sched_uclamp_util_min was modified to 1024
>
> sysctl_sched_uclamp_util_min = 1024
> sysctl_sched_uclamp_util_min_rt_default = 400
>
> uclamp_eff_get() return 400 for the task now, as requested.
>

Yes, I did understand the relation between sysctl_sched_uclamp_util_min and
sysctl_sched_uclamp_util_min_rt_default . It is not clear why we are copying
sysctl_sched_uclamp_util_min_rt_default into p->uclamp_req[UCLAMP_MIN] though
user did not explicitly asked for it. In the above reply, you clearly
mentioned that the new knob works like a request from the user space for all
RT tasks.

As we are copying the sysctl_sched_uclamp_util_min_rt_default value into
p->uclamp_req[UCLAMP_MIN], user gets it when sched_getattr() is called though
sched_setattr() was not called before. I guess that is expected behavior with
your definition of this new tunable. Thanks for answering the question in
detail.

Thanks,
Pavan

> Thanks
>
> --
> Qais Yousef
>
> >
> > >
> > > /* System default restrictions always apply */
> > > if (unlikely(uc_req.value > uc_max.value))
> > > @@ -1114,12 +1149,13 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
> > > loff_t *ppos)
> > > {
> > > bool update_root_tg = false;
> > > - int old_min, old_max;
> > > + int old_min, old_max, old_min_rt;
> > > int result;
> > >
> > > mutex_lock(&uclamp_mutex);
> > > old_min = sysctl_sched_uclamp_util_min;
> > > old_max = sysctl_sched_uclamp_util_max;
> > > + old_min_rt = sysctl_sched_uclamp_util_min_rt_default;
> > >
> > > result = proc_dointvec(table, write, buffer, lenp, ppos);
> > > if (result)
> > > @@ -1133,6 +1169,18 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
> > > goto undo;
> > > }
> > >
> > > + /*
> > > + * The new value will be applied to RT tasks the next time the
> > > + * scheduler needs to calculate the effective uclamp.min for that task,
> > > + * assuming the task is using the system default and not a user
> > > + * specified value. In the latter we shall leave the value as the user
> > > + * requested.
> > > + */
> > > + if (sysctl_sched_uclamp_util_min_rt_default > SCHED_CAPACITY_SCALE) {
> > > + result = -EINVAL;
> > > + goto undo;
> > > + }
> > > +
> > > if (old_min != sysctl_sched_uclamp_util_min) {
> > > uclamp_se_set(&uclamp_default[UCLAMP_MIN],
> > > sysctl_sched_uclamp_util_min, false);
> > > @@ -1158,6 +1206,7 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
> > > undo:
> > > sysctl_sched_uclamp_util_min = old_min;
> > > sysctl_sched_uclamp_util_max = old_max;
> > > + sysctl_sched_uclamp_util_min_rt_default = old_min_rt;
> > > done:
> > > mutex_unlock(&uclamp_mutex);
> > >
> > > @@ -1200,9 +1249,13 @@ static void __setscheduler_uclamp(struct task_struct *p,
> > > if (uc_se->user_defined)
> > > continue;
> > >
> > > - /* By default, RT tasks always get 100% boost */
> > > + /*
> > > + * By default, RT tasks always get 100% boost, which the admins
> > > + * are allowed to change via
> > > + * sysctl_sched_uclamp_util_min_rt_default knob.
> > > + */
> > > if (unlikely(rt_task(p) && clamp_id == UCLAMP_MIN))
> > > - clamp_value = uclamp_none(UCLAMP_MAX);
> > > + clamp_value = sysctl_sched_uclamp_util_min_rt_default;
> > >
> > > uclamp_se_set(uc_se, clamp_value, false);
> > > }
> > > diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> > > index 8a176d8727a3..64117363c502 100644
> > > --- a/kernel/sysctl.c
> > > +++ b/kernel/sysctl.c
> > > @@ -453,6 +453,13 @@ static struct ctl_table kern_table[] = {
> > > .mode = 0644,
> > > .proc_handler = sysctl_sched_uclamp_handler,
> > > },
> > > + {
> > > + .procname = "sched_util_clamp_min_rt_default",
> > > + .data = &sysctl_sched_uclamp_util_min_rt_default,
> > > + .maxlen = sizeof(unsigned int),
> > > + .mode = 0644,
> > > + .proc_handler = sysctl_sched_uclamp_handler,
> > > + },
> > > #endif
> > > #ifdef CONFIG_SCHED_AUTOGROUP
> > > {
> > > --
> > > 2.17.1
> > >
> >
> > Thanks,
> > Pavan
> >
> > --
> > Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.