Re: [EXT] [PATCH v7 06/24] x86/resctrl: Access per-rmid structures by index

From: James Morse
Date: Mon Jan 22 2024 - 13:36:12 EST


Hi Amit,

On 21/01/2024 10:27, Amit Singh Tomar wrote:
> -----Original Message-----
> From: James Morse <james.morse@xxxxxxx>
> Sent: Monday, December 11, 2023 8:03 PM
> To: Amit Singh Tomar <amitsinght@xxxxxxxxxxx>; x86@xxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Cc: Fenghua Yu <fenghua.yu@xxxxxxxxx>; Reinette Chatre <reinette.chatre@xxxxxxxxx>; Thomas Gleixner <tglx@xxxxxxxxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>; Borislav Petkov <bp@xxxxxxxxx>; H Peter Anvin <hpa@xxxxxxxxx>; Babu Moger <Babu.Moger@xxxxxxx>; shameerali.kolothum.thodi@xxxxxxxxxx; D Scott Phillips OS <scott@xxxxxxxxxxxxxxxxxxxxxx>; carl@xxxxxxxxxxxxxxxxxxxxxx; Linu Cherian <lcherian@xxxxxxxxxxx>; bobo.shaobowang@xxxxxxxxxx; tan.shaopeng@xxxxxxxxxxx; baolin.wang@xxxxxxxxxxxxxxxxx; Jamie Iles <quic_jiles@xxxxxxxxxxx>; Xin Hao <xhao@xxxxxxxxxxxxxxxxx>; peternewman@xxxxxxxxxx; dfustini@xxxxxxxxxxxx; muhammad.zahid@xxxxxxxxx
> Subject: Re: [EXT] [PATCH v7 06/24] x86/resctrl: Access per-rmid structures by index

> On 31/10/2023 07:43, Amit Singh Tomar wrote:
>> -----Original Message-----
>> From: James Morse <james.morse@xxxxxxx>
>> Sent: Wednesday, October 25, 2023 11:33 PM
>> Subject: [EXT] [PATCH v7 06/24] x86/resctrl: Access per-rmid
>> structures by index

>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 2a0233cd0bc9..c02cf32cd17c 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -735,19 +768,20 @@ void mbm_setup_overflow_handler(struct
>> rdt_domain *dom, unsigned long delay_ms)
>>
>> static int dom_data_init(struct rdt_resource *r) {
>> + u32 idx_limit = resctrl_arch_system_num_rmid_idx();
>> struct rmid_entry *entry = NULL;
>> - int i, nr_rmids;
>> + u32 idx;
>> + int i;
>>
>> - nr_rmids = r->num_rmid;
>> - rmid_ptrs = kcalloc(nr_rmids, sizeof(struct rmid_entry), GFP_KERNEL);
>> + rmid_ptrs = kcalloc(idx_limit, sizeof(struct rmid_entry),
>> +GFP_KERNEL);
>>
>> [>>] Is there a chance, it could result in "ZERO_SIZE_PTR", and we should guard it against ZERO_OR_NULL_PTR in the following if condition?
>> It might be related, while testing the snapshot[1] (and subsequent snapshots has similar change) on x86 platform, Zahid is seeing Kernel panic:
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pu
>> b_scm_linux_kernel_git_morse_linux.git_tree_fs_resctrl_monitor.c-3Fh-3
>> Dmpam_snapshot_v6.2-23n695&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=V_GK7jR
>> uCHDErm6txmgDK1-MbUihtnSQ3gPgB-A-JKU&m=yHcjuc1ZrYfPWXGxTPifeglinf_gMfy
>> AgnvZfOw-ZD2zRG8G61IfH8hignwaxlV6&s=X3Ie_NqTHtzN2ttkl3yiTYHzNpkWW2wPPI
>> DJ7XTWW40&e=
>
> Interesting - I didn't think this could happen. Could you share the full splat?


(this bit here is your reply?:)

> Unfortunately, I don't have access to the test set-up where this splat has been observed.
> However, I have requested Zahid (Cc) to provide the splat logs.
> Additionally, from what I've learned, this splat has been observed on an x86 machine that
> doesn't support monitor groups. Do you see this as problem?



> This would imply idx_limit was zero, so boot_cpu_data.x86_cache_max_rmid would be -1.
> But wouldn't this happen before this patch? idx_limit has the same value as nr_rmids on x86,
> its only MPAM that needs a different value.

Your 'doesn't support monitor groups' explains why boot_cpu_data.x86_cache_max_rmid is -1.

As you've said you're testing the whole tree - not this series, I suspect this is coming
from "x86/resctrl: Move monitor init work to a resctrl init call", which moves
initialisation of filesystem structures to filesystem code.

It looks like I missed that get_rdt_mon_resources() can bale out before calling
rdt_get_mon_l3_config(), which I think would explain what you hint at here.


Adding this to the "x86/resctrl: Move monitor init work to a resctrl init call" should fix
that. (It'll be in the next snapshot I push)
---------------------%<---------------------
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index b3f245c85e00..791554db7c69 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -1030,12 +1030,14 @@ int resctrl_mon_resource_init(void)
struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
int ret;

+ if (!r->mon_capable)
+ return 0;
+
ret = dom_data_init(r);
if (ret)
return ret;

- if (r->mon_capable)
- l3_mon_evt_init(r);
+ l3_mon_evt_init(r);

return 0;
}
---------------------%<---------------------


Thanks,

James