Re: [PATCH v5 2/2] x86/resctrl: Add tracepoint for llc_occupancy tracking

From: Reinette Chatre
Date: Wed Mar 13 2024 - 18:47:41 EST


Hi Haifeng,

On 3/7/2024 11:41 PM, Haifeng Xu wrote:
> In our production environment, after removing monitor groups, those unused
> RMIDs get stuck in the limbo list forever because their llc_occupancy are
> always larger than the threshold. But the unused RMIDs can be successfully
> freed by turning up the threshold.
>
> In order to know how much the threshold should be, perf can be used to
> acquire the llc_occupancy of RMIDs in each rdt domain.
>
> Instead of using perf tool to track llc_occupancy and filter the log
> manually, it is more convenient for users to use tracepoint to do this
> work. So add a new tracepoint that shows the llc_occupancy of busy RMIDs
> when scanning the limbo list.
>
> Signed-off-by: Haifeng Xu <haifeng.xu@xxxxxxxxxx>
> Suggested-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>
> Suggested-by: James Morse <james.morse@xxxxxxx>
> Reviewed-by: James Morse <james.morse@xxxxxxx>
> ---
> Documentation/arch/x86/resctrl.rst | 8 ++++++++
> arch/x86/kernel/cpu/resctrl/monitor.c | 9 +++++++++
> arch/x86/kernel/cpu/resctrl/trace.h | 16 ++++++++++++++++
> 3 files changed, 33 insertions(+)
>
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index a6279df64a9d..dd3507dc765c 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -478,6 +478,14 @@ if non-contiguous 1s value is supported. On a system with a 20-bit mask
> each bit represents 5% of the capacity of the cache. You could partition
> the cache into four equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
>
> +Tracepoint - mon_llc_occupancy_limbo
> +------------------------------------

I think that the below paragraph would fit nicely as a new paragraph in the
existing "max_threshold_occupancy - generic concepts" section. To support that
just one change to text below ...

> +This tracepoint gives you the precise occupancy values for a subset of RMID

The mon_llc_occupancy_limbo tracepoint gives the precise occupancy in bytes
for a subset of RMID ...

> +that are not immediately available for allocation. This can't be relied on
> +to produce output every second, it may be necessary to attempt to create an
> +empty monitor group to force an update. Output may only be produced if creation
> +of a control or monitor group fails.
> +
> Memory bandwidth Allocation and monitoring
> ==========================================
>
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index c34a35ec0f03..60b6a29a9e29 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -24,6 +24,7 @@
> #include <asm/resctrl.h>
>
> #include "internal.h"
> +#include "trace.h"
>
> /**
> * struct rmid_entry - dirty tracking for all RMID.
> @@ -354,6 +355,14 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
> rmid_dirty = true;
> } else {
> rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
> +
> + /* x86's CLOSID and RMID are independent numbers, so the entry's
> + * closid is a invalid CLOSID. But on arm64, the RMID value isn't
> + * a unique number for each CLOSID. It's necessary to track both
> + * CLOSID and RMID because there may be dependencies between each
> + * other on some architectures.
> + */

Please watch for proper formatting of multi-line comment and consistent capitalization.
I also think comment can be more accurate, for example:

/*
* x86's CLOSID and RMID are independent numbers, so the entry's
* CLOSID is an empty CLOSID (X86_RESCTRL_EMPTY_CLOSID). On Arm the
* RMID (PMG) extends the CLOSID (PARTID) space with bits that aren't used
* to select the configuration. It is thus necessary to track both
* CLOSID and RMID because there may be dependencies between them
* on some architectures.
*/

> + trace_mon_llc_occupancy_limbo(entry->closid, entry->rmid, d->id, val);
> }
>
> if (force_free || !rmid_dirty) {
> diff --git a/arch/x86/kernel/cpu/resctrl/trace.h b/arch/x86/kernel/cpu/resctrl/trace.h
> index ed5c66b8ab0b..b310b4985b94 100644
> --- a/arch/x86/kernel/cpu/resctrl/trace.h
> +++ b/arch/x86/kernel/cpu/resctrl/trace.h
> @@ -35,6 +35,22 @@ TRACE_EVENT(pseudo_lock_l3,
> TP_printk("hits=%llu miss=%llu",
> __entry->l3_hits, __entry->l3_miss));
>
> +TRACE_EVENT(mon_llc_occupancy_limbo,
> + TP_PROTO(u32 ctrl_hw_id, u32 mon_hw_id, int domain_id, u64 llc_occupancy_bytes),
> + TP_ARGS(ctrl_hw_id, mon_hw_id, domain_id, llc_occupancy_bytes),
> + TP_STRUCT__entry(__field(u32, ctrl_hw_id)
> + __field(u32, mon_hw_id)
> + __field(int, domain_id)
> + __field(u64, llc_occupancy_bytes)),
> + TP_fast_assign(__entry->ctrl_hw_id = ctrl_hw_id;
> + __entry->mon_hw_id = mon_hw_id;
> + __entry->domain_id = domain_id;
> + __entry->llc_occupancy_bytes = llc_occupancy_bytes;),
> + TP_printk("ctrl_hw_id=%u mon_hw_id=%u domain_d=%d llc_occupancy_bytes=%llu",

domain_d -> domain_id

> + __entry->ctrl_hw_id, __entry->mon_hw_id, __entry->domain_id,
> + __entry->llc_occupancy_bytes)
> + );
> +
> #endif /* _TRACE_RESCTRL_H */
>
> #undef TRACE_INCLUDE_PATH


Reinette