Re: [kernel/smp] 5408b78b7a: BUG:KASAN:out-of-bounds_in_c

From: Qian Cai
Date: Mon Jul 06 2020 - 14:49:53 EST


On Sun, Jul 05, 2020 at 10:37:03AM -0700, Paul E. McKenney wrote:
> Good catch, but someone beat you to it. This commit contains the fix:
>
> 0504bc41a62c ("kernel/smp: Provide CSD lock timeout diagnostics")

Well, I can still reproduce this on next-20200706 which contains the said fix.

CSD_LOCK_WAIT_DEBUG=n

commit 0504bc41a62c4a42b9316244da7208feca7295cb
Author: Paul E. McKenney <paulmck@xxxxxxxxxx>
Date: Tue Jun 30 13:22:54 2020 -0700

kernel/smp: Provide CSD lock timeout diagnostics

This commit causes csd_lock_wait() to emit diagnostics when a CPU fails
to respond quickly enough to one of the smp_call_function() family of
function calls. These diagnostics include NMI stack traces, and so the
exclusion of idle CPUs is also removed. These diagnostics are enabled
by a new CSD_LOCK_WAIT_DEBUG Kconfig option that depends on DEBUG_KERNEL.

This commit was inspired by an earlier patch by Josef Bacik.

[ paulmck: Avoid 64-bit divides per kernel test robot feedback. ]
[ paulmck: Fix for syzbot+0f719294463916a3fc0e@xxxxxxxxxxxxxxxxxxxxxxxxx ]
Link: https://lore.kernel.org/lkml/00000000000042f21905a991ecea@xxxxxxxxxx
Link: https://lore.kernel.org/lkml/0000000000002ef21705a9933cf3@xxxxxxxxxx
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>

[19929.567055][ T0] BUG: KASAN: out-of-bounds in flush_smp_call_function_queue+0x65f/0x7c0
csd_lock_record at kernel/smp.c:119
(inlined by) flush_smp_call_function_queue at kernel/smp.c:395
[19929.575391][ T0] Read of size 8 at addr ffffc900320879b8 by task swapper/35/0
[19929.582845][ T0]
[19929.585060][ T0] CPU: 35 PID: 0 Comm: swapper/35 Tainted: G O 5.8.0-rc3-next-20200706 #1
[19929.594784][ T0] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
[19929.604072][ T0] Call Trace:
[19929.607253][ T0] dump_stack+0x9d/0xe0
[19929.611304][ T0] ? flush_smp_call_function_queue+0x65f/0x7c0
[19929.617355][ T0] ? flush_smp_call_function_queue+0x65f/0x7c0
[19929.623415][ T0] print_address_description.constprop.8.cold.9+0x56/0x4fc
[19929.630521][ T0] ? log_store.cold.32+0x11/0x11
[19929.635353][ T0] ? lock_downgrade+0x720/0x720
[19929.640097][ T0] ? nr_iowait_cpu+0x78/0xf0
[19929.644576][ T0] ? flush_smp_call_function_queue+0x65f/0x7c0
[19929.650625][ T0] ? flush_smp_call_function_queue+0x65f/0x7c0
[19929.656674][ T0] kasan_report.cold.10+0x37/0x7c
[19929.661587][ T0] ? flush_smp_call_function_queue+0x65f/0x7c0
[19929.667647][ T0] flush_smp_call_function_queue+0x65f/0x7c0
[19929.673535][ T0] flush_smp_call_function_from_idle+0x41/0x71
[19929.679598][ T0] do_idle+0x2d6/0x4f0
[19929.683557][ T0] ? arch_cpu_idle_exit+0x40/0x40
[19929.688480][ T0] cpu_startup_entry+0x14/0x16
[19929.693143][ T0] secondary_startup_64+0xb6/0xc0
[19929.698059][ T0]
[19929.700270][ T0]
[19929.702476][ T0] Memory state around the buggy address:
[19929.708007][ T0] ffffc90032087880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[19929.715986][ T0] ffffc90032087900: 00 00 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00
[19929.723963][ T0] >ffffc90032087980: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00
[19929.731940][ T0] ^
[19929.737999][ T0] ffffc90032087a00: 00 00 00 f2 f2 f2 00 00 00 00 00 00 00 00 00 00
[19929.745982][ T0] ffffc90032087a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00