Re: [RFD] resctrl: reassigning a running container's CTRL_MON group

From: Peter Newman
Date: Mon Oct 17 2022 - 06:16:29 EST


Hi James,

On Wed, Oct 12, 2022 at 6:55 PM James Morse <james.morse@xxxxxxx> wrote:
> You originally asked:
> | Any concerns about the CLOSID-reusing behavior?
>
> I don't think this will work well with MPAM ... I expect it will mess up the bandwidth
> counters.
>
> MPAM's equivalent to RMID is PMG. While on x86 CLOSID and RMID are independent numbers,
> this isn't true for PARTID (MPAM's version of CLOSID) and PMG. The PMG bits effectively
> extended the PARTID with bits that aren't used to look up the configuration.
>
> x86's monitors match only on RMID, and there are 'enough' RMID... MPAMs monitors are more
> complicated. I've seen details of a system that only has 1 bit of PMG space.
>
> While MPAM's bandwidth monitors can match just the PMG, there aren't expected to be enough
> unique PMG for every control/monitor group to have a unique value. Instead, MPAM's
> monitors are expected to be used with both the PARTID and PMG.
>
> ('bandwidth monitors' is relevant here, MPAM's 'cache storage utilisation' monitors can't
> match on just PMG at all - they have to be told the PARTID too)
>
>
> If you're re-using CLOSID like this, I think you'll end up with noisy measurements on MPAM
> systems as the caches hold PARTID/PMG values from before the re-use pattern changed, and
> the monitors have to match on both.

Yes, that sounds like it would be an issue.

Following your refactoring changes, hopefully the MPAM driver could
offer alternative methods for managing PARTIDs and PMGs depending on the
available hardware resources.

If there are a lot more PARTIDs than PMGs, then it would fit well with a
user who never creates child MON groups. In case the number of MON
groups gets ahead of the number of CTRL_MON groups and you've run out of
PMGs, perhaps you would just try to allocate another PARTID and program
the same partitioning configuration before giving up. Of course, there
wouldn't be much point in reusing PARTIDs in such a configuration
either.

If we used the child MON groups as the primary vehicle for moving a
container's tasks between a small number of CTRL_MON groups like in
Reinette's proposal, then it seems like it would be a better use of
hardware to have many PMGs and few PARTIDs. In that case, the monitors
would only match on PMGs. Provided that there are sufficient monitor
instances, there would never be any need to reprogram a monitor's
PMG.

> I have half-finished patches that add a 'resctrl' cgroup controller that can be used to
> group tasks and assign them to control or monitor groups. (the creation and configuration
> of control and monitor groups stays in resctrl - it effectively makes the tasks file
> read-only). I think this might help, as a group of processes can be moved between two
> control/monitor groups with one syscall. New processes that are created inherit from the
> cgroup setting instead of their parent task.
>
> If want to take a look, its here:
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/commit/?h=mpam/snapshot/v6.0&id=4e5987d8ecbc8647dee0aebfb73c3890843ef5dd

> I've not worked the cgroup thread stuff out yet ... it doesn't appear to hook thread
> creation, only fork().

This looks very promising for our use case, as it would be very easy to
use for a container manager. I'm glad you're looking into this.

Thanks!
-Peter