Re: [PATCH] x86/resctrl: Add tracepoint for llc_occupancy tracking

From: Haifeng Xu
Date: Mon Feb 19 2024 - 21:26:31 EST




On 2024/2/6 03:36, Reinette Chatre wrote:
> Hi Haifeng,
>
> Thank you for proposing this feature. I think it is a useful addition.
> I tried it out after removing a monitor group and it was insightful
> to be able to see how the cache occupancy of a removed group
> shrinks over time.
>
> On 1/26/2024 5:02 AM, Haifeng Xu wrote:
>> If llc_occupany is enabled, the rmid may not be freed immediately unless
>
> (llc_occupany -> llc_occupancy ... one more instance below)
>
> Please use caps (RMID) for acronym.
>
>> its llc_occupany is less than the resctrl_rmid_realloc_threshold.
>
> I think it will help to highlight that this is related to monitor group
> removal.
>
>>
>> In our production environment, those unused rmids get stuck in the limbo
>> list forever because their llc_occupancy are larger than the threshold.
>> After turning it up, we can successfully free unused rmids and create
>> new monitor groups. In order to acquire the llc_occupancy of rmids in
>> each rdt domain, we use perf tool to track and filter the log manually.
>
> Could you please elaborate how you "use perf tool to track and
> filter the log manually"?
>
>>
>> It's not efficient enough. Therefore, we can add a new tracepoint that
>
> "It's not efficient enough." is a vague statement. How was efficiency measured?
> and what is "enough"?
>
> Please do not impersonate code ("we can add a new ...") and stick to the
> imperative tone. Please see the "Changelog" section in
> Documentation/process/maintainer-tip.rst for more details.
>
>
>> shows the llc_occupancy of busy rmids when scanning the limbo list. It
>> can help us to adjust the resctrl_rmid_realloc_threshold to a reasonable
>> value.
>>
>> Signed-off-by: Haifeng Xu <haifeng.xu@xxxxxxxxxx>
>> ---
>> arch/x86/kernel/cpu/resctrl/Makefile | 1 +
>> arch/x86/kernel/cpu/resctrl/monitor.c | 5 ++++
>> arch/x86/kernel/cpu/resctrl/monitor_event.h | 30 +++++++++++++++++++++
>> 3 files changed, 36 insertions(+)
>> create mode 100644 arch/x86/kernel/cpu/resctrl/monitor_event.h
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/Makefile b/arch/x86/kernel/cpu/resctrl/Makefile
>> index 4a06c37b9cf1..0d3031850d26 100644
>> --- a/arch/x86/kernel/cpu/resctrl/Makefile
>> +++ b/arch/x86/kernel/cpu/resctrl/Makefile
>> @@ -1,4 +1,5 @@
>> # SPDX-License-Identifier: GPL-2.0
>> obj-$(CONFIG_X86_CPU_RESCTRL) += core.o rdtgroup.o monitor.o
>> obj-$(CONFIG_X86_CPU_RESCTRL) += ctrlmondata.o pseudo_lock.o
>> +CFLAGS_monitor.o = -I$(src)
>> CFLAGS_pseudo_lock.o = -I$(src)
>
> From what I understand only one of the c files should define CREATE_TRACE_POINTS
> and it is that c file that should have its CFLAGS modified.
>
> I do not think it is necessary to preemptively fragment the resctrl tracing by creating
> a separate header file for the monitor tracepoints. This is something that may be
> needed as part of current re-design but I think it best to have a simple start that
> places all tracepoints in the same header file.
>
> I would thus like to propose that this work starts by renaming pseudo_lock_event.h
> to a more generic (trace.h?) that will contain all the tracepoints. pseudo-lock.c
> is the file that define's CREATE_TRACE_POINTS so it should remain as the only
> one with its CFLAGS modified. monitor.c should be able to just include "trace.h"
> (after "trace.h" is updated with the new tracepoint) without needing to define
> CREATE_TRACE_POINTS.
>
>
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index f136ac046851..a6f94fcae174 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -24,6 +24,10 @@
>>
>> #include "internal.h"
>>
>> +#define CREATE_TRACE_POINTS
>> +#include "monitor_event.h"
>> +#undef CREATE_TRACE_POINTS
>> +
>> struct rmid_entry {
>> u32 rmid;
>> int busy;
>> @@ -302,6 +306,7 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
>> }
>> }
>> crmid = nrmid + 1;
>> + trace_rmid_llc_occupancy(nrmid, d->id, val);
>> }
>> }
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor_event.h b/arch/x86/kernel/cpu/resctrl/monitor_event.h
>> new file mode 100644
>> index 000000000000..91265a2dd2c9
>> --- /dev/null
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor_event.h
>> @@ -0,0 +1,30 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#undef TRACE_SYSTEM
>> +#define TRACE_SYSTEM resctrl
>> +
>> +#if !defined(_TRACE_MONITOR_H) || defined(TRACE_HEADER_MULTI_READ)
>> +#define _TRACE_MONITOR_H
>> +
>> +#include <linux/tracepoint.h>
>> +
>> +TRACE_EVENT(rmid_llc_occupancy,
>> + TP_PROTO(u32 rmid, int id, u64 occupancy),
>> + TP_ARGS(rmid, id, occupancy),
>> + TP_STRUCT__entry(__field(u32, rmid)
>> + __field(int, id)
>> + __field(u64, occupancy)),
>> + TP_fast_assign(__entry->rmid = rmid;
>> + __entry->id = id;
>> + __entry->occupancy = occupancy;),
>> + TP_printk("rmid=%u domain=%d occupancy=%llu",
>> + __entry->rmid, __entry->id, __entry->occupancy)
>> + );
>> +
>
> Naming is always complicated but I would like to make two proposals with
> motivation:
> a) Please prefix this event with "mon_" as the first member of a new
> "monitor" category of resctrl tracepoints. Users can then interact
> with monitor tracepoints with convenience of "mon*". Later we could
> perhaps add a new "alloc*" category.
> b) Please replace all "rmid" exposure to user space with a more generic
> "mon_hw_id", or if "mon" is clear, just "hw_id". For reference you
> can search for "mon_hw_id" in Documentation/arch/x86/resctrl.rst.
> This is needed because resctrl is in the process of being able to
> be an interface for Arm's MPAM and "rmid" is an x86 specific term.
>
> Considering this, what do you think of something like:
> tracepoint name: mon_llc_occupancy_limbo
> print: "mon_hw_id=%u domain=%d llc_occupancy=%llu"
>
>> +#endif /* _TRACE_MONITOR_H */
>> +
>> +#undef TRACE_INCLUDE_PATH
>> +#define TRACE_INCLUDE_PATH .
>> +#define TRACE_INCLUDE_FILE monitor_event
>> +
>> +/* This part must be outside protection */
>> +#include <trace/define_trace.h>
>
> Thank you.
>
> Reinette


Thanks for your suggestions and I'll post a new version later.