Re: [RFD] resctrl: reassigning a running container's CTRL_MON group

From: James Morse
Date: Wed Oct 12 2022 - 12:55:53 EST


Hi guys,

On 12/10/2022 12:21, Peter Newman wrote:
> On Tue, Oct 11, 2022 at 1:35 AM Reinette Chatre
> <reinette.chatre@xxxxxxxxx> wrote:
>> On 10/7/2022 10:28 AM, Tony Luck wrote:
>>> I don't know how complex it would for the kernel to implement this. Or
>>> whether it would meet Google's needs.
>>>
>>
>> How about moving monitor groups from one control group to another?
>>
>> Based on the initial description I got the impression that there is
>> already a monitor group for every container. (Please correct me if I am
>> wrong). If this is the case then it may be possible to create an interface
>> that could move an entire monitor group to another control group. This would
>> keep the benefit of usage counts remaining intact, tasks get a new closid, but
>> keep their rmid. There would be no need for the user to specify process-ids.

> Yes, Stephane also pointed out the importance of maintaining RMID assignments
> as well and I don't believe I put enough emphasis on it during my
> original email.
>
> We need to maintain accurate memory bandwidth usage counts on all
> containers, so it's important to be able to maintain an RMID assignment
> and its event counts across a CoS downgrade. The solutions Tony
> suggested do solve the races in moving the tasks, but the container
> would need to temporarily join the default MON group in the new CTRL_MON
> group before it can be moved to its replacement MON group.
>
> Being able to re-parent a MON group would allow us to change the CLOSID
> independently of the RMID in a container and would address the issue.
>
> The only other point I can think of to differentiate it from the
> automatic CLOSID management solution is whether the 1:1 CTRL_MON:CLOSID
> approach will become too limiting going forward. For example, if there
> are configurations where one resource has far fewer CLOSIDs than others
> and we want to start assigning CLOSIDs on-demand, per-resource to avoid
> wasting other resources' available CLOSID spaces. If we can foresee this
> becoming a concern, then automatic CLOSID management would be
> inevitable.

You originally asked:
| Any concerns about the CLOSID-reusing behavior?

I don't think this will work well with MPAM ... I expect it will mess up the bandwidth
counters.

MPAM's equivalent to RMID is PMG. While on x86 CLOSID and RMID are independent numbers,
this isn't true for PARTID (MPAM's version of CLOSID) and PMG. The PMG bits effectively
extended the PARTID with bits that aren't used to look up the configuration.

x86's monitors match only on RMID, and there are 'enough' RMID... MPAMs monitors are more
complicated. I've seen details of a system that only has 1 bit of PMG space.

While MPAM's bandwidth monitors can match just the PMG, there aren't expected to be enough
unique PMG for every control/monitor group to have a unique value. Instead, MPAM's
monitors are expected to be used with both the PARTID and PMG.

('bandwidth monitors' is relevant here, MPAM's 'cache storage utilisation' monitors can't
match on just PMG at all - they have to be told the PARTID too)


If you're re-using CLOSID like this, I think you'll end up with noisy measurements on MPAM
systems as the caches hold PARTID/PMG values from before the re-use pattern changed, and
the monitors have to match on both.


I have half-finished patches that add a 'resctrl' cgroup controller that can be used to
group tasks and assign them to control or monitor groups. (the creation and configuration
of control and monitor groups stays in resctrl - it effectively makes the tasks file
read-only). I think this might help, as a group of processes can be moved between two
control/monitor groups with one syscall. New processes that are created inherit from the
cgroup setting instead of their parent task.

If want to take a look, its here:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/commit/?h=mpam/snapshot/v6.0&id=4e5987d8ecbc8647dee0aebfb73c3890843ef5dd

I've not worked the cgroup thread stuff out yet ... it doesn't appear to hook thread
creation, only fork().


Thanks,

James