Re: [PATCH v3 1/2] sched, cgroup: Restore meaning to hierarchical_quota

From: Phil Auld
Date: Tue Jul 18 2023 - 08:58:58 EST


On Mon, Jul 17, 2023 at 08:27:24AM -1000 Tejun Heo wrote:
> On Fri, Jul 14, 2023 at 08:57:46AM -0400, Phil Auld wrote:
> > In cgroupv2 cfs_b->hierarchical_quota is set to -1 for all task
> > groups due to the previous fix simply taking the min. It should
> > reflect a limit imposed at that level or by an ancestor. Even
> > though cgroupv2 does not require child quota to be less than or
> > equal to that of its ancestors the task group will still be
> > constrained by such a quota so this should be shown here. Cgroupv1
> > continues to set this correctly.
> >
> > In both cases, add initialization when a new task group is created
> > based on the current parent's value (or RUNTIME_INF in the case of
> > root_task_group). Otherwise, the field is wrong until a quota is
> > changed after creation and __cfs_schedulable() is called.
> >
> > Fixes: c53593e5cb69 ("sched, cgroup: Don't reject lower cpu.max on ancestors")
>
> Does this really fix anything observable? I wonder whether this is more
> misleading than helpful. In cgroup2, the value simply wasn't being used,
> right?
>

It wasn't being used but was actively being set wrong. I mean if we are
going to bother doing the __cfs_schedulable() tg tree walk we might as
well have not been setting a bogus value. But that said, no it was not
observable until I tried to use it.

I'm fine if that's dropped. I just wanted it set right going forward :)


> > Signed-off-by: Phil Auld <pauld@xxxxxxxxxx>
> > Reviewed-by: Ben Segall <bsegall@xxxxxxxxxx>
> > Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > Cc: Juri Lelli <juri.lelli@xxxxxxxxxx>
> > Cc: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
> > Cc: Valentin Schneider <vschneid@xxxxxxxxxx>
> > Cc: Ben Segall <bsegall@xxxxxxxxxx>
> > Cc: Frederic Weisbecker <frederic@xxxxxxxxxx>
> > Cc: Tejun Heo <tj@xxxxxxxxxx>
>
> Acked-by: Tejun Heo <tj@xxxxxxxxxx>
>

Thanks!


> > + * always take the non-RUNTIME_INF min. On cgroup1, only
> > + * inherit when no limit is set. In both cases this is used
> > + * by the scheduler to determine if a given CFS task has a
> > + * bandwidth constraint at some higher level.
>
> The discussion on this comment is stretching too long and this is fine too
> but what's worth commenting for cgroup2 is that the limit value itself
> doesn't mean anything and we're just latching onto the value used by cgroup1
> to track whether there's any limit active or not.

I thought that was implied by the wording. "If a given task has a bandwidth
contraint" not "what a given task's bandwidth constraint is". In both cases
that's how the other parts of the scheduler are using it. The actual
non-RUNTIME_INF value only matters in this function (and only for cgroup1
indeed).

But... the value is just as accurate for cgroup2 and cgroup1. The task is
still going to be limited by that bandwidth constraint even if its own
bandwidth limit is nominally higher, right?


Cheers,
Phil

>
> Thanks.
>
> --
> tejun
>

--