RE: [RFD] resctrl: reassigning a running container's CTRL_MON group

From: Yu, Fenghua
Date: Wed Oct 12 2022 - 13:00:03 EST


Hi, Peter,

> > > I don't know how complex it would for the kernel to implement this.
> > > Or whether it would meet Google's needs.
> > >
> >
> > How about moving monitor groups from one control group to another?
> >
> > Based on the initial description I got the impression that there is
> > already a monitor group for every container. (Please correct me if I
> > am wrong). If this is the case then it may be possible to create an
> > interface that could move an entire monitor group to another control
> > group. This would keep the benefit of usage counts remaining intact,
> > tasks get a new closid, but keep their rmid. There would be no need for the
> user to specify process-ids.
>
> Yes, Stephane also pointed out the importance of maintaining RMID
> assignments as well and I don't believe I put enough emphasis on it during my
> original email.
>
> We need to maintain accurate memory bandwidth usage counts on all
> containers, so it's important to be able to maintain an RMID assignment and its
> event counts across a CoS downgrade. The solutions Tony suggested do solve
> the races in moving the tasks, but the container would need to temporarily join
> the default MON group in the new CTRL_MON group before it can be moved to
> its replacement MON group.
>
> Being able to re-parent a MON group would allow us to change the CLOSID
> independently of the RMID in a container and would address the issue.
>
> The only other point I can think of to differentiate it from the automatic CLOSID
> management solution is whether the 1:1 CTRL_MON:CLOSID approach will
> become too limiting going forward. For example, if there are configurations
> where one resource has far fewer CLOSIDs than others and we want to start
> assigning CLOSIDs on-demand, per-resource to avoid wasting other resources'
> available CLOSID spaces. If we can foresee this becoming a concern, then
> automatic CLOSID management would be inevitable.

In the very first resctrl implementation, we did foresee uneven CLOSID per-resource
and allocated CLOSID per-resource on demand to avoid waste CLOSID. But that
implementation was too complex and easier to cause bugs and was not blessed by
the community. Then we changed to allocate statically using minimum CLOSID number.
We decided to change to per-resource on demand if it's really useful.

But so far there is no real usage yet. The current CLOSID assignment still stands so far.

In your case, only two CLOSID is used, right? The current CLOSID assignment can still be used, right?
If that's the case, unnecssary complexity and bug-prone may still be the problem of per-resource on-demand
CLOSID assignment.

Thanks.

-Fenghua