RE: [EXT] [PATCH v7 06/24] x86/resctrl: Access per-rmid structures by index

From: Amit Singh Tomar
Date: Sun Jan 21 2024 - 05:28:55 EST


Hi James,

-----Original Message-----
From: James Morse <james.morse@xxxxxxx>
Sent: Monday, December 11, 2023 8:03 PM
To: Amit Singh Tomar <amitsinght@xxxxxxxxxxx>; x86@xxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
Cc: Fenghua Yu <fenghua.yu@xxxxxxxxx>; Reinette Chatre <reinette.chatre@xxxxxxxxx>; Thomas Gleixner <tglx@xxxxxxxxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>; Borislav Petkov <bp@xxxxxxxxx>; H Peter Anvin <hpa@xxxxxxxxx>; Babu Moger <Babu.Moger@xxxxxxx>; shameerali.kolothum.thodi@xxxxxxxxxx; D Scott Phillips OS <scott@xxxxxxxxxxxxxxxxxxxxxx>; carl@xxxxxxxxxxxxxxxxxxxxxx; Linu Cherian <lcherian@xxxxxxxxxxx>; bobo.shaobowang@xxxxxxxxxx; tan.shaopeng@xxxxxxxxxxx; baolin.wang@xxxxxxxxxxxxxxxxx; Jamie Iles <quic_jiles@xxxxxxxxxxx>; Xin Hao <xhao@xxxxxxxxxxxxxxxxx>; peternewman@xxxxxxxxxx; dfustini@xxxxxxxxxxxx; muhammad.zahid@xxxxxxxxx
Subject: Re: [EXT] [PATCH v7 06/24] x86/resctrl: Access per-rmid structures by index

Hi Amit,

On 31/10/2023 07:43, Amit Singh Tomar wrote:
> -----Original Message-----
> From: James Morse <james.morse@xxxxxxx>
> Sent: Wednesday, October 25, 2023 11:33 PM
> Subject: [EXT] [PATCH v7 06/24] x86/resctrl: Access per-rmid
> structures by index

Looks like you are afflicted with outlook - let me know if I didn't find all the changes you made to the original message ...

[..]

> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
> b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 2a0233cd0bc9..c02cf32cd17c 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -735,19 +768,20 @@ void mbm_setup_overflow_handler(struct
> rdt_domain *dom, unsigned long delay_ms)
>
> static int dom_data_init(struct rdt_resource *r) {
> + u32 idx_limit = resctrl_arch_system_num_rmid_idx();
> struct rmid_entry *entry = NULL;
> - int i, nr_rmids;
> + u32 idx;
> + int i;
>
> - nr_rmids = r->num_rmid;
> - rmid_ptrs = kcalloc(nr_rmids, sizeof(struct rmid_entry), GFP_KERNEL);
> + rmid_ptrs = kcalloc(idx_limit, sizeof(struct rmid_entry),
> +GFP_KERNEL);
>
> [>>] Is there a chance, it could result in "ZERO_SIZE_PTR", and we should guard it against ZERO_OR_NULL_PTR in the following if condition?
> It might be related, while testing the snapshot[1] (and subsequent snapshots has similar change) on x86 platform, Zahid is seeing Kernel panic:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pu
> b_scm_linux_kernel_git_morse_linux.git_tree_fs_resctrl_monitor.c-3Fh-3
> Dmpam_snapshot_v6.2-23n695&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=V_GK7jR
> uCHDErm6txmgDK1-MbUihtnSQ3gPgB-A-JKU&m=yHcjuc1ZrYfPWXGxTPifeglinf_gMfy
> AgnvZfOw-ZD2zRG8G61IfH8hignwaxlV6&s=X3Ie_NqTHtzN2ttkl3yiTYHzNpkWW2wPPI
> DJ7XTWW40&e=

Interesting - I didn't think this could happen. Could you share the full splat?

Unfortunately, I don't have access to the test set-up where this splat has been observed. However, I have requested Zahid (Cc) to provide the splat logs.
Additionally, from what I've learned, this splat has been observed on an x86 machine that doesn't support monitor groups. Do you see this as problem?


This would imply idx_limit was zero, so boot_cpu_data.x86_cache_max_rmid would be -1.
But wouldn't this happen before this patch? idx_limit has the same value as nr_rmids on x86, its only MPAM that needs a different value.


Thanks,
-Amit