[rt-devel:linux-6.6.y-rt-rebase] [panic] 768c33033e: BUG:workqueue_lockup-pool

From: kernel test robot
Date: Fri Nov 17 2023 - 03:54:22 EST




Hello,

kernel test robot noticed "BUG:workqueue_lockup-pool" on:

commit: 768c33033e41ad195a9304ebb183afb730d5ae98 ("panic: Mark emergency section in warn")
https://git.kernel.org/cgit/linux/kernel/git/rt/linux-rt-devel.git linux-6.6.y-rt-rebase

in testcase: boot

compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+----------------------------------------------------------------------------------+------------+------------+
| | fd2ace830c | 768c33033e |
+----------------------------------------------------------------------------------+------------+------------+
| BUG:workqueue_lockup-pool | 0 | 12 |
| INFO:rcu_sched_self-detected_stall_on_CPU | 0 | 12 |
| EIP:desc_read | 0 | 12 |
+----------------------------------------------------------------------------------+------------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202311171611.78d41dbe-oliver.sang@xxxxxxxxx


[ 64.476562][ C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 39s!
[ 64.495685][ C0] Showing busy workqueues and worker pools:
[ 64.496091][ C0] workqueue events: flags=0x0
[ 64.496427][ C0] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[ 64.496432][ C0] pending: deferred_probe_timeout_work_func
[ 64.496442][ C0] workqueue events_power_efficient: flags=0x82
[ 64.497853][ C0] pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/256 refcnt=2
[ 64.497856][ C0] pending: do_cache_clean
[ 64.497862][ C0] workqueue mm_percpu_wq: flags=0x8
[ 64.499031][ C0] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[ 64.499034][ C0] pending: vmstat_update
[ 64.499044][ C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[ 95.196556][ C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 70s!
[ 95.197232][ C0] Showing busy workqueues and worker pools:
[ 95.197640][ C0] workqueue events: flags=0x0
[ 95.197963][ C0] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[ 95.197969][ C0] pending: deferred_probe_timeout_work_func
[ 95.197976][ C0] workqueue events_power_efficient: flags=0x82
[ 95.199381][ C0] pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/256 refcnt=2
[ 95.199384][ C0] pending: do_cache_clean
[ 95.199390][ C0] workqueue mm_percpu_wq: flags=0x8
[ 95.200580][ C0] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[ 95.200583][ C0] pending: vmstat_update
[ 95.200593][ C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[ 125.916557][ C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 101s!
[ 125.917235][ C0] Showing busy workqueues and worker pools:
[ 125.917650][ C0] workqueue events: flags=0x0
[ 125.917979][ C0] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[ 125.917984][ C0] pending: deferred_probe_timeout_work_func
[ 125.917992][ C0] workqueue events_power_efficient: flags=0x82
[ 125.919409][ C0] pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/256 refcnt=2
[ 125.919412][ C0] pending: do_cache_clean
[ 125.919419][ C0] workqueue mm_percpu_wq: flags=0x8
[ 125.920619][ C0] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[ 125.920622][ C0] pending: vmstat_update
[ 125.920632][ C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[ 129.568547][ C1] rcu: INFO: rcu_sched self-detected stall on CPU
[ 129.568553][ C1] rcu: 1-....: (26250 ticks this GP) idle=3c24/1/0x40000000 softirq=5858/5934 fqs=13122
[ 129.568558][ C1] rcu: (t=26251 jiffies g=301 q=598 ncpus=2)
[ 129.568562][ C1] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G W N 6.6.0-00134-g768c33033e41 #1
[ 129.568565][ C1] EIP: desc_read (kernel/printk/printk_ringbuffer.c:439)
[ 129.568573][ C1] Code: fb 83 c4 08 5b 5e 5f 5d c3 8d 74 26 00 0f 0b eb d2 3e 8d 74 26 00 55 89 e5 57 89 cf 56 53 89 c3 83 ec 08 8b 08 b8 ff ff ff ff <89> 55 f0 d3 e0 f7 d0 21 d0 6b f0 58 83 f9 1f 0f 87 a7 e6 83 00 03
All code
========
0: fb sti
1: 83 c4 08 add $0x8,%esp
4: 5b pop %rbx
5: 5e pop %rsi
6: 5f pop %rdi
7: 5d pop %rbp
8: c3 ret
9: 8d 74 26 00 lea 0x0(%rsi,%riz,1),%esi
d: 0f 0b ud2
f: eb d2 jmp 0xffffffffffffffe3
11: 3e 8d 74 26 00 ds lea 0x0(%rsi,%riz,1),%esi
16: 55 push %rbp
17: 89 e5 mov %esp,%ebp
19: 57 push %rdi
1a: 89 cf mov %ecx,%edi
1c: 56 push %rsi
1d: 53 push %rbx
1e: 89 c3 mov %eax,%ebx
20: 83 ec 08 sub $0x8,%esp
23: 8b 08 mov (%rax),%ecx
25: b8 ff ff ff ff mov $0xffffffff,%eax
2a:* 89 55 f0 mov %edx,-0x10(%rbp) <-- trapping instruction
2d: d3 e0 shl %cl,%eax
2f: f7 d0 not %eax
31: 21 d0 and %edx,%eax
33: 6b f0 58 imul $0x58,%eax,%esi
36: 83 f9 1f cmp $0x1f,%ecx
39: 0f 87 a7 e6 83 00 ja 0x83e6e6
3f: 03 .byte 0x3

Code starting with the faulting instruction
===========================================
0: 89 55 f0 mov %edx,-0x10(%rbp)
3: d3 e0 shl %cl,%eax
5: f7 d0 not %eax
7: 21 d0 and %edx,%eax
9: 6b f0 58 imul $0x58,%eax,%esi
c: 83 f9 1f cmp $0x1f,%ecx
f: 0f 87 a7 e6 83 00 ja 0x83e6bc
15: 03 .byte 0x3
[ 129.568575][ C1] EAX: ffffffff EBX: c1d5cd60 ECX: 0000000f EDX: 3fff8001
[ 129.568576][ C1] ESI: 00000001 EDI: c320fd78 EBP: c320fd28 ESP: c320fd14
[ 129.568578][ C1] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00000286
[ 129.568582][ C1] CR0: 80050033 CR2: b7d77b93 CR3: 022b4000 CR4: 00040690
[ 129.568583][ C1] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 129.568584][ C1] DR6: fffe0ff0 DR7: 00000400
[ 129.568585][ C1] Call Trace:
[ 129.568589][ C1] ? show_regs (arch/x86/kernel/dumpstack.c:479)
[ 129.568596][ C1] ? dump_cpu_task (kernel/sched/core.c:11572)
[ 129.568602][ C1] ? rcu_dump_cpu_stacks (include/linux/cpumask.h:143 include/linux/cpumask.h:150 include/linux/cpumask.h:212 kernel/rcu/tree_stall.h:367)
[ 129.568606][ C1] ? vprintk (kernel/printk/printk_safe.c:46)
[ 129.568611][ C1] ? print_cpu_stall (kernel/rcu/tree_stall.h:692)
[ 129.568616][ C1] ? check_cpu_stall (kernel/rcu/tree_stall.h:775)
[ 129.568620][ C1] ? rcu_sched_clock_irq (kernel/rcu/tree.c:3869 kernel/rcu/tree.c:2240)
[ 129.568623][ C1] ? update_process_times (arch/x86/include/asm/preempt.h:27 kernel/time/timer.c:2080)
[ 129.568628][ C1] ? tick_sched_timer (kernel/time/tick-sched.c:1497)
[ 129.568632][ C1] ? __hrtimer_run_queues (kernel/time/hrtimer.c:1688 kernel/time/hrtimer.c:1752)
[ 129.568636][ C1] ? cpumask_weight (arch/x86/kernel/alternative.c:1766)
[ 129.568640][ C1] ? hrtimer_interrupt (kernel/time/hrtimer.c:1817)
[ 129.568645][ C1] ? sysvec_call_function_single (arch/x86/kernel/apic/apic.c:1074)
[ 129.568650][ C1] ? __sysvec_apic_timer_interrupt (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:444 include/linux/jump_label.h:260 include/linux/jump_label.h:270 arch/x86/include/asm/trace/irq_vectors.h:41 arch/x86/kernel/apic/apic.c:1081)
[ 129.568653][ C1] ? sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1074)
[ 129.568656][ C1] ? handle_exception (arch/x86/entry/entry_32.S:1049)
[ 129.568662][ C1] ? sysvec_call_function_single (arch/x86/kernel/apic/apic.c:1074)
[ 129.568664][ C1] ? desc_read (kernel/printk/printk_ringbuffer.c:439)
[ 129.568668][ C1] ? sysvec_call_function_single (arch/x86/kernel/apic/apic.c:1074)
[ 129.568670][ C1] ? desc_read (kernel/printk/printk_ringbuffer.c:439)
[ 129.568675][ C1] desc_read_finalized_seq (kernel/printk/printk_ringbuffer.c:1907)
[ 129.568680][ C1] prb_next_reserve_seq (kernel/printk/printk_ringbuffer.c:2084)
[ 129.568682][ C1] ? nbcon_get_cpu_emergency_nesting (kernel/printk/nbcon.c:984)
[ 129.568689][ C1] nbcon_atomic_flush_all (kernel/printk/nbcon.c:1133)
[ 129.568692][ C1] ? nbcon_atomic_flush_all (kernel/printk/nbcon.c:1133)
[ 129.568695][ C1] printk_trigger_flush (arch/x86/include/asm/preempt.h:85 kernel/printk/printk.c:3993)
[ 129.568698][ C1] nbcon_cpu_emergency_exit (kernel/printk/nbcon.c:1193)
[ 129.568701][ C1] ? refcount_warn_saturate (lib/refcount.c:28 (discriminator 3))
[ 129.568706][ C1] __warn (kernel/panic.c:692)
[ 129.568711][ C1] ? refcount_warn_saturate (lib/refcount.c:28 (discriminator 3))
[ 129.568714][ C1] report_bug (lib/bug.c:201 lib/bug.c:219)
[ 129.568720][ C1] ? exc_overflow (arch/x86/kernel/traps.c:250)
[ 129.568722][ C1] handle_bug (arch/x86/kernel/traps.c:216)
[ 129.568724][ C1] exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
[ 129.568727][ C1] handle_exception (arch/x86/entry/entry_32.S:1049)
[ 129.568729][ C1] EIP: refcount_warn_saturate (lib/refcount.c:28 (discriminator 3))
[ 129.568731][ C1] Code: 79 18 c2 01 68 a8 b9 c3 c1 e8 03 7b cf ff 0f 0b 58 c9 c3 8d b6 00 00 00 00 c6 05 86 79 18 c2 01 68 50 b9 c3 c1 e8 e7 7a cf ff <0f> 0b 59 c9 c3 66 90 89 c2 8b 00 3d 00 00 00 c0 74 12 83 f8 01 74
All code
========
0: 79 18 jns 0x1a
2: c2 01 68 ret $0x6801
5: a8 b9 test $0xb9,%al
7: c3 ret
8: c1 e8 03 shr $0x3,%eax
b: 7b cf jnp 0xffffffffffffffdc
d: ff 0f decl (%rdi)
f: 0b 58 c9 or -0x37(%rax),%ebx
12: c3 ret
13: 8d b6 00 00 00 00 lea 0x0(%rsi),%esi
19: c6 05 86 79 18 c2 01 movb $0x1,-0x3de7867a(%rip) # 0xffffffffc21879a6
20: 68 50 b9 c3 c1 push $0xffffffffc1c3b950
25: e8 e7 7a cf ff call 0xffffffffffcf7b11
2a:* 0f 0b ud2 <-- trapping instruction
2c: 59 pop %rcx
2d: c9 leave
2e: c3 ret
2f: 66 90 xchg %ax,%ax
31: 89 c2 mov %eax,%edx
33: 8b 00 mov (%rax),%eax
35: 3d 00 00 00 c0 cmp $0xc0000000,%eax
3a: 74 12 je 0x4e
3c: 83 f8 01 cmp $0x1,%eax
3f: 74 .byte 0x74

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 59 pop %rcx
3: c9 leave
4: c3 ret
5: 66 90 xchg %ax,%ax
7: 89 c2 mov %eax,%edx
9: 8b 00 mov (%rax),%eax
b: 3d 00 00 00 c0 cmp $0xc0000000,%eax
10: 74 12 je 0x24
12: 83 f8 01 cmp $0x1,%eax
15: 74 .byte 0x74


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231117/202311171611.78d41dbe-oliver.sang@xxxxxxxxx



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki