Re: [PATCH v3] sched: async unthrottling for cfs bandwidth

From: Josh Don
Date: Mon Nov 21 2022 - 14:37:34 EST


On Mon, Nov 21, 2022 at 3:58 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Sun, Nov 20, 2022 at 10:22:40AM +0800, Chengming Zhou wrote:
> > > + if (cfs_rq->runtime_remaining > 0) {
> > > + if (cpu_of(rq) != this_cpu ||
> > > + SCHED_WARN_ON(local_unthrottle)) {
> > > + unthrottle_cfs_rq_async(cfs_rq);
> > > + } else {
> > > + local_unthrottle = cfs_rq;
> > > + }
> > > + } else {
> > > + throttled = true;
> > > + }
> >
> > Hello,
> >
> > I don't get the point why local unthrottle is put after all the remote cpus,
> > since this list is FIFO? (earliest throttled cfs_rq is at the head)
>
> Let the local completion time for a CPU be W. Then if we queue a remote
> work after the local synchronous work, the lower bound for total
> completion is at least 2W.
>
> OTOH, if we first queue all remote work and then process the local
> synchronous work, the lower bound for total completion is W.
>
> The practical difference is that all relevant CPUs get unthrottled
> rougly at the same point in time, unlike with the original case, where
> some CPUs have the opportunity to consume W runtime while another is
> still throttled.

Yep, this tradeoff feels "best", but there are some edge cases where
this could potentially disrupt fairness. For example, if we have
non-trivial W, a lot of cpus to iterate through for dispatching remote
unthrottle, and quota is small. Doesn't help that the timer is pinned
so that this will continually hit the same cpu. But as I alluded to, I
think the net benefit here is greater with the local unthrottling
ordered last.