Re: [kernel/smp] 5408b78b7a: BUG:KASAN:out-of-bounds_in_c

From: Paul E. McKenney
Date: Mon Jul 06 2020 - 19:12:51 EST


On Mon, Jul 06, 2020 at 02:49:41PM -0400, Qian Cai wrote:
> On Sun, Jul 05, 2020 at 10:37:03AM -0700, Paul E. McKenney wrote:
> > Good catch, but someone beat you to it. This commit contains the fix:
> >
> > 0504bc41a62c ("kernel/smp: Provide CSD lock timeout diagnostics")
>
> Well, I can still reproduce this on next-20200706 which contains the said fix.
>
> CSD_LOCK_WAIT_DEBUG=n

Indeed you can, good catch, and thank you!

There was a csd_lock_record(csd) that instead needed to be
csd_lock_record(NULL). A fix is in progress.

Thanx, Paul

> commit 0504bc41a62c4a42b9316244da7208feca7295cb
> Author: Paul E. McKenney <paulmck@xxxxxxxxxx>
> Date: Tue Jun 30 13:22:54 2020 -0700
>
> kernel/smp: Provide CSD lock timeout diagnostics
>
> This commit causes csd_lock_wait() to emit diagnostics when a CPU fails
> to respond quickly enough to one of the smp_call_function() family of
> function calls. These diagnostics include NMI stack traces, and so the
> exclusion of idle CPUs is also removed. These diagnostics are enabled
> by a new CSD_LOCK_WAIT_DEBUG Kconfig option that depends on DEBUG_KERNEL.
>
> This commit was inspired by an earlier patch by Josef Bacik.
>
> [ paulmck: Avoid 64-bit divides per kernel test robot feedback. ]
> [ paulmck: Fix for syzbot+0f719294463916a3fc0e@xxxxxxxxxxxxxxxxxxxxxxxxx ]
> Link: https://lore.kernel.org/lkml/00000000000042f21905a991ecea@xxxxxxxxxx
> Link: https://lore.kernel.org/lkml/0000000000002ef21705a9933cf3@xxxxxxxxxx
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
>
> [19929.567055][ T0] BUG: KASAN: out-of-bounds in flush_smp_call_function_queue+0x65f/0x7c0
> csd_lock_record at kernel/smp.c:119
> (inlined by) flush_smp_call_function_queue at kernel/smp.c:395
> [19929.575391][ T0] Read of size 8 at addr ffffc900320879b8 by task swapper/35/0
> [19929.582845][ T0]
> [19929.585060][ T0] CPU: 35 PID: 0 Comm: swapper/35 Tainted: G O 5.8.0-rc3-next-20200706 #1
> [19929.594784][ T0] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> [19929.604072][ T0] Call Trace:
> [19929.607253][ T0] dump_stack+0x9d/0xe0
> [19929.611304][ T0] ? flush_smp_call_function_queue+0x65f/0x7c0
> [19929.617355][ T0] ? flush_smp_call_function_queue+0x65f/0x7c0
> [19929.623415][ T0] print_address_description.constprop.8.cold.9+0x56/0x4fc
> [19929.630521][ T0] ? log_store.cold.32+0x11/0x11
> [19929.635353][ T0] ? lock_downgrade+0x720/0x720
> [19929.640097][ T0] ? nr_iowait_cpu+0x78/0xf0
> [19929.644576][ T0] ? flush_smp_call_function_queue+0x65f/0x7c0
> [19929.650625][ T0] ? flush_smp_call_function_queue+0x65f/0x7c0
> [19929.656674][ T0] kasan_report.cold.10+0x37/0x7c
> [19929.661587][ T0] ? flush_smp_call_function_queue+0x65f/0x7c0
> [19929.667647][ T0] flush_smp_call_function_queue+0x65f/0x7c0
> [19929.673535][ T0] flush_smp_call_function_from_idle+0x41/0x71
> [19929.679598][ T0] do_idle+0x2d6/0x4f0
> [19929.683557][ T0] ? arch_cpu_idle_exit+0x40/0x40
> [19929.688480][ T0] cpu_startup_entry+0x14/0x16
> [19929.693143][ T0] secondary_startup_64+0xb6/0xc0
> [19929.698059][ T0]
> [19929.700270][ T0]
> [19929.702476][ T0] Memory state around the buggy address:
> [19929.708007][ T0] ffffc90032087880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [19929.715986][ T0] ffffc90032087900: 00 00 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00
> [19929.723963][ T0] >ffffc90032087980: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00
> [19929.731940][ T0] ^
> [19929.737999][ T0] ffffc90032087a00: 00 00 00 f2 f2 f2 00 00 00 00 00 00 00 00 00 00
> [19929.745982][ T0] ffffc90032087a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00