Re: [RFD] resctrl: reassigning a running container's CTRL_MON group

From: James Morse
Date: Wed Oct 19 2022 - 09:34:43 EST


Hi Peter,

On 19/10/2022 10:08, Peter Newman wrote:
> On Wed, Oct 12, 2022 at 7:23 PM Reinette Chatre
> <reinette.chatre@xxxxxxxxx> wrote:
>> What if resctrl adds support to rdtgroup_kf_syscall_ops for
>> the .rename callback?
>>
>> It seems like doing so could enable users to do something like:
>> mv /sys/fs/resctrl/groupA/mon_groups/containerA /sys/fs/resctrl/groupB/mon_groups/
>>
>> Such a user request would trigger the "containerA" monitor group
>> to be moved to another control group. All tasks within it could be moved to
>> the new control group (their CLOSIDs are changed) while their RMIDs
>> remain intact.
>
> I think this will be the best approach for us, since we need separate
> counters for every job. Unless you were planning to implement this very
> soon, I will prototype it for the container manager team to try out and
> submit patches for review if it works for them.
>
>> I just read James's response and I do not know how this could be made to
>> work with the Arm monitoring when it arrives. Potentially there
>> could be an architecture specific "move monitor group" call.

> AFAICT all we could do in that situation is hope there are plenty of
> CLOSIDs, since we wouldn't be able to create any additional monitoring
> groups.
>
> What's still unclear to me is exactly how an application would interpret
> the reported CLOSID and RMID counts to decide whether it should create
> lots of MON groups vs CTRL_MON groups, given that the RMID count would
> mean something semantically different on MPAM.

Yeah - its top of the list in the 'ABI problems' section of the KNOWN_ISSUES file.


> I would not want to see
> the container manager asking itself "am I on an ARM system?" when
> calculating how many containers' bandwidth usage it can count.

This would be a terrible!


> (Maybe James has an answer to this question.)

I don't. Its an unfortunate difference that is visible to user-space.

Currently the MPAM tree proposes to expose '1' as num_rmid on arm64, because the right
answer depends on whether you intend to create monitoring groups or control groups.

My best bet is to expose some new properties, 'num_groups' at the root level (which would
have the same value as num_closid), and inside each control group's 'mon_groups'. For x86
the later would be the same as num_rmid, but on arm64 it would be the maximum PMG bits.


Thanks,

James
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.