Re: [PATCH 2/2] x86/MCE: Add command line option to extend MCE Records pool

From: Borislav Petkov
Date: Mon Feb 12 2024 - 17:42:51 EST


On Mon, Feb 12, 2024 at 11:19:13PM +0100, Borislav Petkov wrote:
> On Mon, Feb 12, 2024 at 11:08:33PM +0100, Borislav Petkov wrote:
> > I'll have to dig into my archives tomorrow, on a clear head...
>
> So I checked out 648ed94038c030245a06e4be59744fd5cdc18c40 which is
> 4.2-something.
>
> And even back then, mcheck_cpu_init() gets called *after* mm_init()
> which already initializes the allocators. So why did we allocate that
> buffer statically?

Found it in my archives. You should have it too:

Date: Thu, 31 Jul 2014 02:51:25 -0400
From: "Chen, Gong" <gong.chen@xxxxxxxxxxxxxxx>
To: tony.luck@xxxxxxxxx, bp@xxxxxxxxx
Subject: Re: [RFC PATCH untest v2 1/4] x86, MCE: Provide a lock-less memory pool to save error record
Message-ID: <20140731065125.GA5999@xxxxxxxxxxxxxxxxxx>

and that's not on any ML that's why I can't find it on lore...

There's this fragment from Chen:

--------
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index bb92f38..a1b6841 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -2023,6 +2023,9 @@ int __init mcheck_init(void)
> {
> mcheck_intel_therm_init();
>
> + if (!mce_genpool_init())
> + mca_cfg.disabled = true;
> +
when setup_arch is called, memory subsystem hasn't been initialized,
which means I can't use regular page allocation function. So I still
need to put genpool init in mcheck_cpu_init.
--------

And that is still the case - mcheck_init() gets called in setup_arch()
and thus before before mm_init() which is called mm_core_init() now.

And on that same thread we agree that we should allocate it statically
but then the call to mce_gen_pool_init() ended up in mcheck_cpu_init()
which happens *after* mm_init().

What a big fscking facepalm. :-\

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette