Re: [Bug] WARNING: ODEBUG bug in __mcheck_cpu_init_timer

From: Borislav Petkov
Date: Wed Mar 13 2024 - 10:52:57 EST


On Mon, Mar 04, 2024 at 10:26:28PM +0800, Sam Sun wrote:
> Dear developers and maintainers,
>
> We encountered a kernel warning with our modified Syzkaller. It is
> tested on kernel 6.8.0-rc7. C repro and kernel config are attached to
> this email. Bug report is listed below.

See if that fixes it.

Thx.

---
From: "Borislav Petkov (AMD)" <bp@xxxxxxxxx>
Date: Wed, 13 Mar 2024 14:48:27 +0100
Subject: [PATCH] x86/mce: Make sure to grab mce_sysfs_mutex in set_bank()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Modifying a MCA bank's MCA_CTL bits which control which error types to
be reported is done over

/sys/devices/system/machinecheck/
├── machinecheck0
│   ├── bank0
│   ├── bank1
│   ├── bank10
│   ├── bank11
...

sysfs nodes by writing the new bit mask of events to enable.

When the write is accepted, the kernel deletes all current timers and
reinits all banks.

Doing that in parallel can lead to initializing a timer which is already
armed and in the timer wheel, i.e., in use already:

ODEBUG: init active (active state 0) object: ffff888063a28000 object
type: timer_list hint: mce_timer_fn+0x0/0x240 arch/x86/kernel/cpu/mce/core.c:2642
WARNING: CPU: 0 PID: 8120 at lib/debugobjects.c:514
debug_print_object+0x1a0/0x2a0 lib/debugobjects.c:514

Fix that by grabbing the sysfs mutex as the rest of the MCA sysfs code
does.

Reported by: Yue Sun <samsun1006219@xxxxxxxxx>
Reported by: xingwei lee <xrivendell7@xxxxxxxxx>
Signed-off-by: Borislav Petkov (AMD) <bp@xxxxxxxxx>
Cc: <stable@xxxxxxxxxx>
Link: https://lore.kernel.org/r/CAEkJfYNiENwQY8yV1LYJ9LjJs%2Bx_-PqMv98gKig55=2vbzffRw@xxxxxxxxxxxxxx
---
arch/x86/kernel/cpu/mce/core.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index b5cc557cfc37..84d41be6d06b 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -2500,12 +2500,14 @@ static ssize_t set_bank(struct device *s, struct device_attribute *attr,
return -EINVAL;

b = &per_cpu(mce_banks_array, s->id)[bank];
-
if (!b->init)
return -ENODEV;

b->ctl = new;
+
+ mutex_lock(&mce_sysfs_mutex);
mce_restart();
+ mutex_unlock(&mce_sysfs_mutex);

return size;
}
--
2.43.0

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette