Re: [PATCH v2] sched: async unthrottling for cfs bandwidth

From: Josh Don
Date: Tue Nov 01 2022 - 15:12:46 EST


On Mon, Oct 31, 2022 at 6:46 PM Tejun Heo <tj@xxxxxxxxxx> wrote:
>
> On Mon, Oct 31, 2022 at 06:01:19PM -0700, Josh Don wrote:
> > > Yeah, especially with narrow cpuset (or task cpu affinity) configurations,
> > > it can get pretty bad. Outside that tho, at least I haven't seen a lot of
> > > problematic cases as long as the low priority one isn't tightly entangled
> > > with high priority tasks, mostly because 1. if the resource the low pri one
> > > is holding affects large part of the system, the problem is self-solving as
> > > the system quickly runs out of other things to do 2. if the resource isn't
> > > affecting large part of the system, their blast radius is usually reasonably
> > > confined to things tightly coupled with it. I'm sure there are exceptions
> > > and we definitely wanna improve the situation where it makes sense.
> >
> > cgroup_mutex and kernfs rwsem beg to differ :) These are shared with
> > control plane threads, so it is pretty easy to starve those out even
> > while the system has plenty of work to do.
>
> Hahaha yeah, good point. We definitely wanna improve them. There were some
> efforts to improve kernfs locking granularity earlier this year. It was
> promising but didn't get to the finish line. cgroup_mutex, w/ cgroup2 and
> especially with the optimizations around CLONE_INTO_CGROUP, we avoid that in
> most hot paths and hopefully that should help quite a bit. If it continues
> to be a problem, we definitely wanna further improve it.
>
> Just to better understand the situation, can you give some more details on
> the scenarios where cgroup_mutex was in the middle of a shitshow?

There have been a couple, I think one of the main ones has been writes
to cgroup.procs. cpuset modifications also show up since there's a
mutex there.

>
> Thanks.
>
> --
> tejun