Re: [linux-next:master] [printk] 8bdbdd7f43: BUG:scheduling_while_atomic

From: Paul E. McKenney
Date: Thu Dec 01 2022 - 15:30:09 EST


On Thu, Dec 01, 2022 at 05:12:44PM +0100, Petr Mladek wrote:
> On Thu 2022-12-01 05:27:13, Zhang, Qiang1 wrote:
> > Greeting,
> >
> > FYI, we noticed BUG:scheduling_while_atomic due to commit (built with clang-14):
> >
> > commit: 8bdbdd7f43cd74c7faca6add8a62d541503ae21d ("printk: Prepare for SRCU console list protection")
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >
> > in testcase: boot
> >
> > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> >
> > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> >
> >
> > [ 8.561823][ T0] BUG: scheduling while atomic: swapper/0/0x00000002
> > [ 8.569154][ T0] no locks held by swapper/0.
> > [ 8.571934][ T0] Modules linked in:
> > [ 8.573001][ T0] CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc1-00015-g8bdbdd7f43cd #1 9e431b4e696756e99f94acf7e74ac6e512df80ce
> > [ 8.576740][ T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> > [ 8.579942][ T0] Call Trace:
> > [ 8.581143][ T0] <TASK>
> > [ 8.582054][ T0] dump_stack_lvl (??:?)
> > [ 8.583312][ T0] ? netdev_notice (??:?)
> > [ 8.584753][ T0] ? lockdep_print_held_locks (lockdep.c:?)
> > [ 8.586563][ T0] __schedule_bug (core.c:?)
> > [ 8.588171][ T0] ? trace_sched_switch (core.c:?)
> > [ 8.589753][ T0] ? save_trace (lockdep.c:?)
> > [ 8.591135][ T0] schedule_debug (core.c:?)
> > [ 8.592582][ T0] __schedule (core.c:?)
> > [ 8.593902][ T0] ? __sched_text_start (core.c:?)
> > [ 8.595356][ T0] ? add_chain_block (lockdep.c:?)
> > [ 8.596847][ T0] ? find_held_lock (lockdep.c:?)
> > [ 8.598368][ T0] schedule (??:?)
> > [ 8.599564][ T0] schedule_timeout (??:?)
> > [ 8.600937][ T0] ? console_conditional_schedule (??:?)
> > [ 8.602773][ T0] do_wait_for_common (build_utility.c:?)
> > [ 8.604522][ T0] ? console_conditional_schedule (??:?)
> > [ 8.606462][ T0] ? bit_wait_io_timeout (build_utility.c:?)
> > [ 8.608196][ T0] ? _raw_spin_lock_irq (??:?)
> > [ 8.609935][ T0] ? lockdep_hardirqs_on (??:?)
> > [ 8.611646][ T0] wait_for_completion (??:?)
> > [ 8.613253][ T0] synchronize_srcu (??:?)
>
> Hmm, I do not understand it. It seems that this is called
> in normal context. There was console_unlock() right
> before synchronize_srcu(). It means that we took
> semaphore in the same context.
>
> Also all the calls above looks correct. wait_for_completion()
> seems to be called in normal context as well.

Is it possible that this is happening between the time that the scheduler
is running, but before workqueues are fully up and running?

If so, I can adjust synchronize_srcu() to operate during that time.
But I would need confirmation that this in fact the situation.

Thanx, Paul

> Best Regards,
> Petr
>
> > [ 8.614825][ T0] ? srcu_gp_start_if_needed (??:?)
> > [ 8.616664][ T0] ? rcu_read_lock_any_held (??:?)
> > [ 8.618281][ T0] ? console_trylock_spinning (??:?)
> > [ 8.620181][ T0] unregister_console (??:?)
> > [ 8.621875][ T0] register_console (??:?)
> > [ 8.623401][ T0] ? serial8250_isa_init_ports (8250_core.c:?)
> > [ 8.625281][ T0] ? parse_options (super.c:?)
> > [ 8.626887][ T0] univ8250_console_init (8250_core.c:?)
> > [ 8.628583][ T0] console_init (??:?)
> > [ 8.630025][ T0] start_kernel (??:?)
> > [ 8.631558][ T0] secondary_startup_64_no_verify (??:?)
> > [ 8.633502][ T0] </TASK>
> > [ 8.634624][ T0] ------------[ cut here ]------------
> > [ 8.636289][ T0] releasing a pinned lock
> > [ 8.638693][ T0] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:5352 lock_release (??:?)
> > [ 8.641591][ T0] Modules linked in:
> > [ 8.642864][ T0] CPU: 0 PID: 0 Comm: swapper Tainted: G W 6.1.0-rc1-00015-g8bdbdd7f43cd #1 9e431b4e696756e99f94acf7e74ac6e512df80ce
> > [ 8.646469][ T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> > [ 8.649578][ T0] RIP: 0010:lock_release (??:?)
> > [ 8.651243][ T0] Code: 00 00 e9 0f fe ff ff 48 83 05 06 f6 ff 06 01 e8 91 4e 2d 03 e9 67 fe ff ff 48 c7 c7 20 8c cf 84 e8 90 3f ec ff 48 8b 54 24 08 <0f> 0b 48 83 05 a9 f6 ff 06 01 e9 66 fc ff ff e8 67 c9 99 01 85 c0
> > All code
> > ========
> > 0: 00 00 add %al,(%rax)
> > 2: e9 0f fe ff ff jmpq 0xfffffffffffffe16
> > 7: 48 83 05 06 f6 ff 06 addq $0x1,0x6fff606(%rip) # 0x6fff615
> > e: 01
> > f: e8 91 4e 2d 03 callq 0x32d4ea5
> > 14: e9 67 fe ff ff jmpq 0xfffffffffffffe80
> > 19: 48 c7 c7 20 8c cf 84 mov $0xffffffff84cf8c20,%rdi
> > 20: e8 90 3f ec ff callq 0xffffffffffec3fb5
> > 25: 48 8b 54 24 08 mov 0x8(%rsp),%rdx
> > 2a:* 0f 0b ud2 <-- trapping instruction
> > 2c: 48 83 05 a9 f6 ff 06 addq $0x1,0x6fff6a9(%rip) # 0x6fff6dd
> > 33: 01
> > 34: e9 66 fc ff ff jmpq 0xfffffffffffffc9f
> > 39: e8 67 c9 99 01 callq 0x199c9a5
> > 3e: 85 c0 test %eax,%eax
> >
> > Code starting with the faulting instruction
> > ===========================================
> > 0: 0f 0b ud2
> > 2: 48 83 05 a9 f6 ff 06 addq $0x1,0x6fff6a9(%rip) # 0x6fff6b3
> > 9: 01
> > a: e9 66 fc ff ff jmpq 0xfffffffffffffc75
> > f: e8 67 c9 99 01 callq 0x199c97b
> > 14: 85 c0 test %eax,%eax
> > [ 8.656638][ T0] RSP: 0000:ffffffff862077c0 EFLAGS: 00010086
> > [ 8.658249][ T0] RAX: 0000000000000017 RBX: ffffffff86244244 RCX: ffffffff8631a420
> > [ 8.660383][ T0] RDX: ffffffff86244218 RSI: 0000000000000008 RDI: 0000000000000001
> > [ 8.662798][ T0] RBP: ffffffff862078e0 R08: dffffc0000000000 R09: 00000000035545d9
> > [ 8.665205][ T0] R10: dffff7fff0c40e7a R11: 0000000000000001 R12: ffffffff862fcf58
> > [ 8.667420][ T0] R13: 1ffffffff0c40f06 R14: ffffffff86244220 R15: dffffc0000000000
> > [ 8.669824][ T0] FS: 0000000000000000(0000) GS:ffffffff862ab000(0000) knlGS:0000000000000000
> > [ 8.672552][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 8.674557][ T0] CR2: ffff88843ffff000 CR3: 0000000006235000 CR4: 00000000000406b0
> > [ 8.677082][ T0] Call Trace:
> > [ 8.678220][ T0] <TASK>
> > [ 8.679230][ T0] ? __lock_acquire (??:?)
> > [ 8.680709][ T0] ? kvm_sched_clock_read (kvmclock.c:?)
> > [ 8.682309][ T0] ? sched_clock_cpu (??:?)
> > [ 8.683854][ T0] ? kvm_sched_clock_read (kvmclock.c:?)
> > [ 8.685268][ T0] _raw_spin_unlock (??:?)
> > [ 8.686730][ T0] dequeue_task_idle (build_policy.c:?)
> > [ 8.688259][ T0] ? set_next_task_idle (build_policy.c:?)
> > [ 8.689822][ T0] ? update_rq_clock (??:?)
> > [ 8.691387][ T0] __schedule (core.c:?)
> > [ 8.692782][ T0] ? __sched_text_start (core.c:?)
> > [ 8.694287][ T0] ? add_chain_block (lockdep.c:?)
> > [ 8.695869][ T0] ? find_held_lock (lockdep.c:?)
> > [ 8.697314][ T0] schedule (??:?)
> > [ 8.698529][ T0] schedule_timeout (??:?)
> > [ 8.701929][ T0] ? console_conditional_schedule (??:?)
> > [ 8.703826][ T0] do_wait_for_common (build_utility.c:?)
> > [ 8.705395][ T0] ? console_conditional_schedule (??:?)
> > [ 8.707189][ T0] ? bit_wait_io_timeout (build_utility.c:?)
> > [ 8.708779][ T0] ? _raw_spin_lock_irq (??:?)
> > [ 8.710371][ T0] ? lockdep_hardirqs_on (??:?)
> > [ 8.711982][ T0] wait_for_completion (??:?)
> > [ 8.713487][ T0] synchronize_srcu (??:?)
> > [ 8.715020][ T0] ? srcu_gp_start_if_needed (??:?)
> > [ 8.716772][ T0] ? rcu_read_lock_any_held (??:?)
> > [ 8.718477][ T0] ? console_trylock_spinning (??:?)
> > [ 8.720179][ T0] unregister_console (??:?)
> > [ 8.721644][ T0] register_console (??:?)
> > [ 8.728846][ T0] ? serial8250_isa_init_ports (8250_core.c:?)
> > [ 8.730692][ T0] ? parse_options (super.c:?)
> > [ 8.732180][ T0] univ8250_console_init (8250_core.c:?)
> > [ 8.733762][ T0] console_init (??:?)
> > [ 8.735096][ T0] start_kernel (??:?)
> > [ 8.736425][ T0] secondary_startup_64_no_verify (??:?)
> > [ 8.738088][ T0] </TASK>
> > [ 8.739057][ T0] irq event stamp: 494
> > [ 8.740376][ T0] hardirqs last enabled at (493): dump_stack_lvl (??:?)
> > [ 8.743081][ T0] hardirqs last disabled at (494): __schedule (core.c:?)
> > [ 8.745797][ T0] softirqs last enabled at (0): 0x0
> > [ 8.747882][ T0] softirqs last disabled at (0): 0x0
> > [ 8.749889][ T0] ---[ end trace 0000000000000000 ]---
> > [ 8.751520][ T0] bad: scheduling from the idle thread!
> > [ 8.753351][ T0] CPU: 0 PID: 0 Comm: swapper Tainted: G W 6.1.0-rc1-00015-g8bdbdd7f43cd #1 9e431b4e696756e99f94acf7e74ac6e512df80ce
> > [ 8.757566][ T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> > [ 8.760783][ T0] Call Trace:
> > [ 8.761970][ T0] <TASK>
> > [ 8.763067][ T0] dump_stack_lvl (??:?)
> > [ 8.764611][ T0] ? netdev_notice (??:?)
> > [ 8.766213][ T0] ? lockdep_hardirqs_on_prepare (??:?)
> > [ 8.768214][ T0] ? print_irqtrace_events (??:?)
> > [ 8.769911][ T0] ? kvm_sched_clock_read (kvmclock.c:?)
> > [ 8.771550][ T0] dequeue_task_idle (build_policy.c:?)
> > [ 8.773087][ T0] ? set_next_task_idle (build_policy.c:?)
> > [ 8.774647][ T0] ? update_rq_clock (??:?)
> > [ 8.776280][ T0] __schedule (core.c:?)
> > [ 8.777729][ T0] ? __sched_text_start (core.c:?)
> > [ 8.779238][ T0] ? add_chain_block (lockdep.c:?)
> > [ 8.780922][ T0] ? find_held_lock (lockdep.c:?)
> > [ 8.782560][ T0] schedule (??:?)
> > [ 8.783988][ T0] schedule_timeout (??:?)
> > [ 8.785515][ T0] ? console_conditional_schedule (??:?)
> > [ 8.787500][ T0] do_wait_for_common (build_utility.c:?)
> > [ 8.789162][ T0] ? console_conditional_schedule (??:?)
> > [ 8.790997][ T0] ? bit_wait_io_timeout (build_utility.c:?)
> > [ 8.792567][ T0] ? _raw_spin_lock_irq (??:?)
> > [ 8.794292][ T0] ? lockdep_hardirqs_on (??:?)
> > [ 8.796008][ T0] wait_for_completion (??:?)
> > [ 8.797520][ T0] synchronize_srcu (??:?)
> > [ 8.799001][ T0] ? srcu_gp_start_if_needed (??:?)
> > [ 8.800696][ T0] ? rcu_read_lock_any_held (??:?)
> > [ 8.802508][ T0] ? console_trylock_spinning (??:?)
> > [ 8.804316][ T0] unregister_console (??:?)
> > [ 8.805810][ T0] register_console (??:?)
> > [ 8.807364][ T0] ? serial8250_isa_init_ports (8250_core.c:?)
> > [ 8.811167][ T0] ? parse_options (super.c:?)
> > [ 8.812693][ T0] univ8250_console_init (8250_core.c:?)
> > [ 8.814269][ T0] console_init (??:?)
> > [ 8.815765][ T0] start_kernel (??:?)
> > [ 8.817296][ T0] secondary_startup_64_no_verify (??:?)
> > [ 8.819246][ T0] </TASK>
> >
> >
> >
> > Through the config file, it can be found that the following options are enabled
> >
> > CONFIG_TINY_RCU=y
> > CONFIG_RCU_EXPERT=y
> > CONFIG_SRCU=y
> > CONFIG_TINY_SRCU=y
> >
> > This problem should have been fixed:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=0bf43fcbf8ebbf84a2b1b8770eaefcdb4a99afd6
> >
> > commit dbc6ca150842650d513705f26e3e6b7a4e182ce9
> > Author: Zqiang <qiang1.zhang@xxxxxxxxx>
> > Date: Wed Nov 9 15:36:38 2022 +0800
> >
> > srcu: Make Tiny synchronize_srcu() check for readers
> >
> > This commit adds lockdep checks for illegal use of synchronize_srcu()
> > within same-type SRCU read-side critical sections and within normal
> > RCU read-side critical sections. It also makes synchronize_srcu()
> > be a no-op during early boot.
> >
> > These changes bring Tiny synchronize_srcu() into line with both Tree
> > synchronize_srcu() and Tiny synchronize_rcu().
> >
> > Signed-off-by: Zqiang <qiang1.zhang@xxxxxxxxx>
> > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
> >
> >
> > Thanks
> > Zqiang
> >
> >
> >
> >
> >
> >
> >
> > If you fix the issue, kindly add following tag
> > | Reported-by: kernel test robot <yujie.liu@xxxxxxxxx>
> > | Link: https://lore.kernel.org/oe-lkp/202211302358.ef0db537-yujie.liu@xxxxxxxxx
> >
> >
> > To reproduce:
> >
> > # build kernel
> > cd linux
> > cp config-6.1.0-rc1-00015-g8bdbdd7f43cd .config
> > make HOSTCC=clang-14 CC=clang-14 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
> > make HOSTCC=clang-14 CC=clang-14 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
> > cd <mod-install-dir>
> > find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
> >
> >
> > git clone https://github.com/intel/lkp-tests.git
> > cd lkp-tests
> > bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
> >
> > # if come across any failure that blocks the test,
> > # please remove ~/.lkp and /lkp dir to run from a clean state.
> >
> >
> > --
> > 0-DAY CI Kernel Test Service
> > https://01.org/lkp