rcu_sched stall while waiting in csd_lock_wait()

From: Pratyush Anand
Date: Tue Aug 01 2017 - 23:31:56 EST


Hi,

I am observing following rcu_sched stall while executing `perf record -a -- sleep 1` with one of the arm64 platform. It looks like that stalled cpu was waiting in csd_lock_wait() from where it never came out,and so the stall. Any help/pointer for further debugging would be very helpful. Problem also reproduced with 4.13.0-rc3.


[11108.758804] task: ffff801782490000 task.stack: ffff801782574000
[11108.764710] PC is at smp_call_function_single+0x128/0x178
[11108.770089] LR is at smp_call_function_single+0x11c/0x178
[11108.775470] pc : [<ffff0000081615ac>] lr : [<ffff0000081615a0>] pstate: 20400145
[11108.782848] sp : ffff801782577b80
[11108.786147] x29: ffff801782577b80 x28: ffff8017815c3800
[11108.791442] x27: ffff0000087f1000 x26: 000000000000001d
[11108.796737] x25: 0000000000000000 x24: ffff0000081ddc90
[11108.802032] x23: ffff000008c73000 x22: 0000000000000006
[11108.807327] x21: ffff0000081da548 x20: ffff801782577c18
[11108.812623] x19: ffff000008c73000 x18: 0000fffffa011c30
[11108.817918] x17: 0000000000000000 x16: 0000000000000000
[11108.823213] x15: 00000000006ed000 x14: 0000000000620170
[11108.828508] x13: ffffffffffffffff x12: 0000000000000005
[11108.833803] x11: 0000000000000000 x10: 00000000ffffffff
[11108.839098] x9 : 0000fffffa011e40 x8 : 000000000000001d
[11108.844393] x7 : 0000000000000000 x6 : ffff801782577bc8
[11108.849689] x5 : 0000000000000040 x4 : ffff801782577be0
[11108.854984] x3 : ffff801782577be0 x2 : 0000000000000000
[11108.860279] x1 : 0000000000000003 x0 : 0000000000000000
[11108.865574]
[11141.885599] INFO: rcu_sched self-detected stall on CPU
[11141.889777] 7-...: (5982 ticks this GP) idle=2fb/140000000000001/0 softirq=604/604 fqs=2995
[11141.898281] (t=6000 jiffies g=11855 c=11854 q=19196)
[11141.903404] Task dump for CPU 7:
[11141.906615] perf R running task 0 2357 2329 0x00000202
[11141.913647] Call trace:
[11141.916081] [<ffff000008088770>] dump_backtrace+0x0/0x270
[11141.921460] [<ffff000008088a04>] show_stack+0x24/0x2c
[11141.926496] [<ffff0000080fcd50>] sched_show_task+0xec/0x130
[11141.932050] [<ffff0000080fe7e8>] dump_cpu_task+0x48/0x54
[11141.937347] [<ffff0000081ee314>] rcu_dump_cpu_stacks+0xac/0xf4
[11141.943163] [<ffff000008144b94>] rcu_check_callbacks+0x7a0/0xa84
[11141.949152] [<ffff00000814ae98>] update_process_times+0x34/0x5c
[11141.955055] [<ffff00000815c7ec>] tick_sched_handle.isra.15+0x38/0x70
[11141.961390] [<ffff00000815c86c>] tick_sched_timer+0x48/0x88
[11141.966946] [<ffff00000814b84c>] __hrtimer_run_queues+0x150/0x2e4
[11141.973022] [<ffff00000814c264>] hrtimer_interrupt+0xa0/0x1d4
[11141.978754] [<ffff000008650898>] arch_timer_handler_phys+0x3c/0x48
[11141.984916] [<ffff000008137c40>] handle_percpu_devid_irq+0x98/0x210
[11141.991164] [<ffff0000081321a4>] generic_handle_irq+0x34/0x4c
[11141.996893] [<ffff000008132868>] __handle_domain_irq+0x6c/0xc4
[11142.002710] [<ffff000008081608>] gic_handle_irq+0x94/0x190
[11142.008178] Exception stack(0xffff801782577a50 to 0xffff801782577b80)
[11142.014602] 7a40: 0000000000000000 0000000000000003
[11142.022414] 7a60: 0000000000000000 ffff801782577be0 ffff801782577be0 0000000000000040
[11142.030226] 7a80: ffff801782577bc8 0000000000000000 000000000000001d 0000fffffa011e40
[11142.038039] 7aa0: 00000000ffffffff 0000000000000000 0000000000000005 ffffffffffffffff
[11142.045851] 7ac0: 0000000000620170 00000000006ed000 0000000000000000 0000000000000000
[11142.053664] 7ae0: 0000fffffa011c30 ffff000008c73000 ffff801782577c18 ffff0000081da548
[11142.061477] 7b00: 0000000000000006 ffff000008c73000 ffff0000081ddc90 0000000000000000
[11142.069289] 7b20: 000000000000001d ffff0000087f1000 ffff8017815c3800 ffff801782577b80
[11142.077101] 7b40: ffff0000081615a0 ffff801782577b80 ffff0000081615ac 0000000020400145
[11142.084914] 7b60: 0000001b00000019 ffff801782577c18 0001000000000000 ffff801782577bc8
[11142.092727] [<ffff000008082eb4>] el1_irq+0xb4/0x140
[11142.097588] [<ffff0000081615ac>] smp_call_function_single+0x128/0x178
[11142.104014] [<ffff0000081d8124>] cpu_function_call+0x40/0x64
[11142.109654] [<ffff0000081d837c>] event_function_call+0x100/0x108
[11142.115643] [<ffff0000081d83d0>] _perf_event_disable+0x4c/0x74
[11142.121459] [<ffff0000081d7f78>] perf_event_for_each_child+0x38/0x98
[11142.127796] [<ffff0000081e110c>] perf_ioctl+0xec/0x7b4
[11142.132920] [<ffff00000829e564>] do_vfs_ioctl+0xcc/0x7bc
[11142.138213] [<ffff00000829ece4>] SyS_ioctl+0x90/0xa4
[11142.143161] [<ffff0000080834a0>] __sys_trace_return+0x0/0x4
[11142.148717] INFO: rcu_sched detected stalls on CPUs/tasks:
[11142.154187] 7-...: (5983 ticks this GP) idle=2fb/140000000000000/0 softirq=604/604 fqs=2995
[11142.162692] (detected by 8, t=6027 jiffies, g=11855, c=11854, q=19211)
[11142.169289] Task dump for CPU 7:
[11142.172500] perf R running task 0 2357 2329 0x00000202
[11142.179531] Call trace:
[11142.181964] [<ffff0000080857c8>] __switch_to+0x64/0x70
[11142.187088] [<ffff000008c20000>] cpu_number+0x0/0x8


--
Regards
Pratyush