Re: [PATCH v4 01/24] x86/resctrl: Track the closid with the rmid

From: Reinette Chatre
Date: Thu Jun 15 2023 - 18:01:30 EST


Hi James,

On 5/25/2023 11:01 AM, James Morse wrote:
> x86's RMID are independent of the CLOSID. An RMID can be allocated,
> used and freed without considering the CLOSID.
>
> MPAM's equivalent feature is PMG, which is not an independent number,
> it extends the CLOSID/PARTID space. For MPAM, only PMG-bits worth of
> 'RMID' can be allocated for a single CLOSID.
> i.e. if there is 1 bit of PMG space, then each CLOSID can have two
> monitor groups.
>
> To allow resctrl to disambiguate RMID values for different CLOSID,
> everything in resctrl that keeps an RMID value needs to know the CLOSID
> too. This will always be ignored on x86.
>
> Tested-by: Shaopeng Tan <tan.shaopeng@xxxxxxxxxxx>
> Reviewed-by: Xin Hao <xhao@xxxxxxxxxxxxxxxxx>
> Signed-off-by: James Morse <james.morse@xxxxxxx>
>
> ---
> Is there a better term for 'the unique identifier for a monitor group'.
> Using RMID for that here may be confusing...
>
> Changes since v1:
> * Added comment in struct rmid_entry
>
> Changes since v2:
> * Moved X86_RESCTRL_BAD_CLOSID from a subsequent patch
> ---
> arch/x86/include/asm/resctrl.h | 7 +++
> arch/x86/kernel/cpu/resctrl/internal.h | 2 +-
> arch/x86/kernel/cpu/resctrl/monitor.c | 65 ++++++++++++++---------
> arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 4 +-
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 12 ++---
> include/linux/resctrl.h | 11 +++-
> 6 files changed, 64 insertions(+), 37 deletions(-)
>
> diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
> index 255a78d9d906..e906070285fb 100644
> --- a/arch/x86/include/asm/resctrl.h
> +++ b/arch/x86/include/asm/resctrl.h
> @@ -7,6 +7,13 @@
> #include <linux/sched.h>
> #include <linux/jump_label.h>
>
> +/*
> + * This value can never be a valid CLOSID, and is used when mapping a
> + * (closid, rmid) pair to an index and back. On x86 only the RMID is
> + * needed.
> + */
> +#define X86_RESCTRL_BAD_CLOSID ((u32)~0)
> +

I do not think this name is appropriate considering that it is later used
in x86 normal operation. I understand naming is complicated, could
something like X86_RESCTRL_EMPTY_CLOSID be appropriate?

> /**
> * struct resctrl_pqr_state - State cache for the PQR MSR
> * @cur_rmid: The cached Resource Monitoring ID

...

> @@ -628,7 +640,7 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
> * the software controller explicitly.
> */
> if (is_mba_sc(NULL))
> - mbm_bw_count(rmid, &rr);
> + mbm_bw_count(closid, rmid, &rr);
> }
> }
>
> @@ -685,11 +697,11 @@ void mbm_handle_overflow(struct work_struct *work)
> d = container_of(work, struct rdt_domain, mbm_over.work);
>
> list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
> - mbm_update(r, d, prgrp->mon.rmid);
> + mbm_update(r, d, prgrp->closid, prgrp->mon.rmid);
>
> head = &prgrp->mon.crdtgrp_list;
> list_for_each_entry(crgrp, head, mon.crdtgrp_list)
> - mbm_update(r, d, crgrp->mon.rmid);
> + mbm_update(r, d, crgrp->closid, crgrp->mon.rmid);
>
> if (is_mba_sc(NULL))
> update_mba_bw(prgrp, d);
> @@ -732,10 +744,11 @@ static int dom_data_init(struct rdt_resource *r)
> }
>
> /*
> - * RMID 0 is special and is always allocated. It's used for all
> - * tasks that are not monitored.
> + * RMID 0 is special and is always allocated. It's used for the

Considering this change it may be more specific to now state that both
CLOSID 0 and RMID 0 are special.

> + * default_rdtgroup control group, which will be setup later. See

default_rdtgroup -> rdtgroup_default

> + * rdtgroup_setup_root().
> */
> - entry = __rmid_entry(0);
> + entry = __rmid_entry(0, 0);
> list_del(&entry->list);
>
> return 0;

...


> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 8334eeacfec5..7d80bae05f59 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -225,6 +225,8 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
> * for this resource and domain.
> * @r: resource that the counter should be read from.
> * @d: domain that the counter should be read from.
> + * @closid: closid that matches the rmid. The counter may
> + * match traffic of both closid and rmid, or rmid only.

It may be helpful to be specific in the references to parameters:
closid that matches @rmid. The counter may
match traffic of both @closid and @rmid, or @rmid only.

The "counter may match" text is not specific. Can detail be added
on what decides whether a counter matches both closid and rmid, or
rmid only?

> * @rmid: rmid of the counter to read.
> * @eventid: eventid to read, e.g. L3 occupancy.
> * @val: result of the counter read in bytes.
> @@ -235,20 +237,25 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
> * 0 on success, or -EIO, -EINVAL etc on error.
> */
> int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
> - u32 rmid, enum resctrl_event_id eventid, u64 *val);
> + u32 closid, u32 rmid, enum resctrl_event_id eventid,
> + u64 *val);
> +
>
> /**
> * resctrl_arch_reset_rmid() - Reset any private state associated with rmid
> * and eventid.
> * @r: The domain's resource.
> * @d: The rmid's domain.
> + * @closid: The closid that matches the rmid. Counters may match both
> + * closid and rmid, or rmid only.

Same here.

> * @rmid: The rmid whose counter values should be reset.
> * @eventid: The eventid whose counter values should be reset.
> *
> * This can be called from any CPU.
> */
> void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
> - u32 rmid, enum resctrl_event_id eventid);
> + u32 closid, u32 rmid,
> + enum resctrl_event_id eventid);
>
> /**
> * resctrl_arch_reset_rmid_all() - Reset all private state associated with


Reinette