Re: [Bug] WARNING: ODEBUG bug in __mcheck_cpu_init_timer

From: Sam Sun
Date: Wed Mar 13 2024 - 12:32:44 EST


On Wed, Mar 13, 2024 at 10:52 PM Borislav Petkov <bp@xxxxxxxxx> wrote:
>
> On Mon, Mar 04, 2024 at 10:26:28PM +0800, Sam Sun wrote:
> > Dear developers and maintainers,
> >
> > We encountered a kernel warning with our modified Syzkaller. It is
> > tested on kernel 6.8.0-rc7. C repro and kernel config are attached to
> > this email. Bug report is listed below.
>
> See if that fixes it.
>
> Thx.

I applied this patch on the latest kernel mainline commit, and the C
repro could not trigger this bug. I think this bug is fixed by this
patch.

Best Regards,
Yue

>
> ---
> From: "Borislav Petkov (AMD)" <bp@xxxxxxxxx>
> Date: Wed, 13 Mar 2024 14:48:27 +0100
> Subject: [PATCH] x86/mce: Make sure to grab mce_sysfs_mutex in set_bank()
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> Modifying a MCA bank's MCA_CTL bits which control which error types to
> be reported is done over
>
> /sys/devices/system/machinecheck/
> ├── machinecheck0
> │ ├── bank0
> │ ├── bank1
> │ ├── bank10
> │ ├── bank11
> ...
>
> sysfs nodes by writing the new bit mask of events to enable.
>
> When the write is accepted, the kernel deletes all current timers and
> reinits all banks.
>
> Doing that in parallel can lead to initializing a timer which is already
> armed and in the timer wheel, i.e., in use already:
>
> ODEBUG: init active (active state 0) object: ffff888063a28000 object
> type: timer_list hint: mce_timer_fn+0x0/0x240 arch/x86/kernel/cpu/mce/core.c:2642
> WARNING: CPU: 0 PID: 8120 at lib/debugobjects.c:514
> debug_print_object+0x1a0/0x2a0 lib/debugobjects.c:514
>
> Fix that by grabbing the sysfs mutex as the rest of the MCA sysfs code
> does.
>
> Reported by: Yue Sun <samsun1006219@xxxxxxxxx>
> Reported by: xingwei lee <xrivendell7@xxxxxxxxx>
> Signed-off-by: Borislav Petkov (AMD) <bp@xxxxxxxxx>
> Cc: <stable@xxxxxxxxxx>
> Link: https://lore.kernel.org/r/CAEkJfYNiENwQY8yV1LYJ9LjJs%2Bx_-PqMv98gKig55=2vbzffRw@xxxxxxxxxxxxxx
> ---
> arch/x86/kernel/cpu/mce/core.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index b5cc557cfc37..84d41be6d06b 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -2500,12 +2500,14 @@ static ssize_t set_bank(struct device *s, struct device_attribute *attr,
> return -EINVAL;
>
> b = &per_cpu(mce_banks_array, s->id)[bank];
> -
> if (!b->init)
> return -ENODEV;
>
> b->ctl = new;
> +
> + mutex_lock(&mce_sysfs_mutex);
> mce_restart();
> + mutex_unlock(&mce_sysfs_mutex);
>
> return size;
> }
> --
> 2.43.0
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette