Re: [PATCH v2 2/2] x86/resctrl: Add tracepoint for llc_occupancy tracking

From: James Morse
Date: Fri Mar 01 2024 - 12:48:43 EST


Hi Reinette,

On 23/02/2024 19:41, Reinette Chatre wrote:
> On 2/21/2024 1:21 AM, Haifeng Xu wrote:
>> In our production environment, after removing monitor groups, those unused
>> RMIDs get stuck in the limbo list forever because their llc_occupancy are
>> always larger than the threshold. But the unused RMIDs can be successfully
>> freed by turning up the threshold.
>>
>> In order to know how much the threshold should be, the following steps can
>> be taken to acquire the llc_occupancy of RMIDs in each rdt domain:
>>
>> 1) perf probe -a '__rmid_read eventid rmid'
>> perf probe -a '__rmid_read%return $retval'
>> 2) perf record -e probe:__rmid_read -e probe:__rmid_read__return -aR sleep 10
>> 3) perf script > __rmid_read.txt
>> 4) cat __rmid_read.txt | grep "eventid=0x1 " -A 1 | grep "kworker" > llc_occupnacy.txt

Ah, this ftrace trickery. It wouldn't be portable - I agree a tracepoint is much better!


> The details on how perf can be used was useful during the discussion of this
> work but can be omitted from this changelog.
>
>> Instead of using perf tool to track llc_occupancy and filter the log manually,
>> it is more convenient for users to use tracepoint to do this work. So add a new
>> tracepoint that shows the llc_occupancy of busy RMIDs when scanning the limbo
>> list.

>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index f136ac046851..1533b1932b49 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -23,6 +23,7 @@
>> #include <asm/resctrl.h>
>>
>> #include "internal.h"
>> +#include "trace.h"
>>
>> struct rmid_entry {
>> u32 rmid;
>> @@ -302,6 +303,7 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
>> }
>> }
>> crmid = nrmid + 1;
>> + trace_mon_llc_occupancy_limbo(nrmid, d->id, val);

> This area recently received some changes (you can find the latest on the
> x86/cache branch of the tip repo). Please see [1] for a good
> description of the new "index". For this tracing to be useful to MPAM
> I thus expect that the tracepoint will need to print the MPAM equivalent
> to CLOSID, the PARTID. We can refer to this CLOSID/PARTID value as
> "ctrl_hw_id".
>
> This snippet can then change to use the new resctrl_arch_rmid_idx_decode()
> to learn the "ctrl_hw_id" and "mon_hw_id" and print it as part of
> tracepoint:
> "ctrl_hw_id=%u mon_hw_id=%u domain=%d llc_occupancy=%llu"
>
> This will be filesystem code so it cannot know how an architecture
> treats these numbers. Consequently, this may look strange to x86 users
> when ctrl_hw_id will always be X86_RESCTRL_EMPTY_CLOSID ... but it should
> be clear that it is invalid?


> James, what do you think? Any thoughts on how MPAM will use the limbo handler
> to understand what information can be useful to the user here?

Initially it will be exactly the same, and this certainly works. I agree outputting both
the CLOSID and RMID (with more portable names) is the right thing to do.

I'll reply in more detail on what appears to be v3.
lore.kernel.org/r/20240229071125.100991-1-haifeng.xu@xxxxxxxxxx


Thanks,

James