Re: rcu_sched self-detected stall on CPU

From: NTU
Date: Tue Sep 20 2016 - 10:34:37 EST


Well this time I got a lot more symbols for you guys! I didn't change
anything except reboot a couple times. Log is attached. I don't debug
symbols often (never done it with kernel code before) but I'll give it
a shot with gdb and report back, I'll be glad to help solve this
issue.

Alec Ari

On Thu, Sep 15, 2016 at 12:15 PM, NTU <neotheuser@xxxxxxxxx> wrote:
> Sorry!!
>
> Alec Ari
>
> On Thu, Sep 15, 2016 at 1:22 AM, Mike Galbraith
> <umgwanakikbuti@xxxxxxxxx> wrote:
>> On Wed, 2016-09-14 at 23:02 -0500, NTU wrote:
>>> [ 26.542980] Call Trace:
>>> [ 26.542983] [<ffffffffa7fbd7c1>] ? 0xffffffffa7fbd7c1
>>> [ 26.542985] [<ffffffffa7f17d35>] ? 0xffffffffa7f17d35
>>> [ 26.542986] [<ffffffffa796c115>] ? 0xffffffffa796c115
>>> [ 26.542988] [<ffffffffa7633d86>] ? 0xffffffffa7633d86
>>> [ 26.542989] [<ffffffffa796c0a6>] ? 0xffffffffa796c0a6
>> ...
>>
>> The stack trace is useless without symbols.
[ 25.304576] INFO: rcu_sched self-detected stall on CPU
[ 25.305585] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 25.305595] 3-...: (20987 ticks this GP) idle=943/140000000000001/0 softirq=146/146 fqs=5170
[ 25.305599] (detected by 1, t=21002 jiffies, g=-222, c=-223, q=6)
[ 25.305600] Task dump for CPU 3:
[ 25.305604] swapper/0 R running task 0 1 0 0x00000008
[ 25.305609] 0000000000000003 ffff8803a4bef228 0000000000015120 ffff8803a4be4780
[ 25.305613] ffffffffa0e0e600 0000000000000297 0000000000000000 0000000000000297
[ 25.305616] ffff88039dd91ed0 0000000000085d8c ffff8803a4bef220 0000000000000297
[ 25.305617] Call Trace:
[ 25.305626] [<ffffffff9fdbfe01>] ? depot_save_stack+0x181/0x610
[ 25.305630] [<ffffffff9fd1a205>] ? free_cpumask_var+0x15/0x20
[ 25.305635] [<ffffffff9f76e495>] ? save_stack+0xb5/0xd0
[ 25.305638] [<ffffffff9f433d86>] ? save_stack_trace+0x26/0x50
[ 25.305641] [<ffffffff9f76e426>] ? save_stack+0x46/0xd0
[ 25.305644] [<ffffffff9f76eae1>] ? kasan_slab_free+0x71/0xb0
[ 25.305646] [<ffffffff9f76c950>] ? kfree+0x80/0x140
[ 25.305649] [<ffffffff9fd1a205>] ? free_cpumask_var+0x15/0x20
[ 25.305653] [<ffffffff9f457b6e>] ? native_send_call_func_ipi+0x19e/0x3e0
[ 25.305657] [<ffffffff9f5d308f>] ? smp_call_function_many+0x51f/0x900
[ 25.305660] [<ffffffff9f5d3521>] ? on_each_cpu+0x31/0x50
[ 25.305664] [<ffffffff9f48e4a7>] ? flush_tlb_all+0x17/0x20
[ 25.305667] [<ffffffff9f480be2>] ? __change_page_attr_set_clr+0x4a2/0x2340
[ 25.305670] [<ffffffff9f481d85>] ? __change_page_attr_set_clr+0x1645/0x2340
[ 25.305673] [<ffffffff9f482db6>] ? change_page_attr_set_clr+0x336/0xa60
[ 25.305676] [<ffffffff9f485a64>] ? set_memory_np+0x34/0x40
[ 25.305679] [<ffffffff9f4773d2>] ? free_init_pages+0x122/0x130
[ 25.305682] [<ffffffff9f4788ea>] ? mark_rodata_ro+0x1ca/0x230
[ 25.305685] [<ffffffffa0c6b114>] ? kernel_init+0x34/0x180
[ 25.305688] [<ffffffffa0c7c00f>] ? ret_from_fork+0x1f/0x40
[ 25.305691] [<ffffffff9f76a08a>] ? alloc_debug_processing+0x5a/0x1b0
[ 25.305694] [<ffffffff9f76b06e>] ? ___slab_alloc.constprop.58+0x2de/0x420
[ 25.305697] [<ffffffff9fd1a0ce>] ? alloc_cpumask_var_node+0x5e/0xf0
[ 25.305700] [<ffffffff9f76e536>] ? kasan_unpoison_shadow+0x36/0x50
[ 25.305703] [<ffffffff9f76e536>] ? kasan_unpoison_shadow+0x36/0x50
[ 25.305706] [<ffffffff9fd4a3d8>] ? __bitmap_weight+0xd8/0x160
[ 25.305709] [<ffffffff9f76eaf7>] ? kasan_slab_free+0x87/0xb0
[ 25.305711] [<ffffffff9f76c950>] ? kfree+0x80/0x140
[ 25.305714] [<ffffffff9fd1a205>] ? free_cpumask_var+0x15/0x20
[ 25.305718] [<ffffffff9f457b6e>] ? native_send_call_func_ipi+0x19e/0x3e0
[ 25.305721] [<ffffffff9f4579d0>] ? native_smp_send_reschedule+0x110/0x110
[ 25.305724] [<ffffffff9fd59b59>] ? _find_next_bit+0xe9/0x120
[ 25.305726] [<ffffffff9fd59bb7>] ? find_next_bit+0x27/0x30
[ 25.305730] [<ffffffff9f5d2c08>] ? smp_call_function_many+0x98/0x900
[ 25.305733] [<ffffffff9f48d3a0>] ? flush_tlb_func+0x130/0x130
[ 25.305736] [<ffffffff9f47fd4d>] ? lookup_address_in_pgd+0x32d/0x440
[ 25.305739] [<ffffffff9f47fd4d>] ? lookup_address_in_pgd+0x32d/0x440
[ 25.305742] [<ffffffff9f480f67>] ? __change_page_attr_set_clr+0x827/0x2340
[ 25.305745] [<ffffffff9f480740>] ? print_split+0x830/0x830
[ 25.305748] [<ffffffff9fe7a688>] ? bitfill_aligned+0x148/0x1a0
[ 25.305751] [<ffffffff9f420446>] ? print_context_stack+0x76/0xe0
[ 25.305755] [<ffffffff9f420446>] ? print_context_stack+0x76/0xe0
[ 25.305758] [<ffffffff9fdbfe01>] ? depot_save_stack+0x181/0x610
[ 25.305761] [<ffffffff9fd1a205>] ? free_cpumask_var+0x15/0x20
[ 25.305763] [<ffffffff9f76e495>] ? save_stack+0xb5/0xd0
[ 25.305766] [<ffffffff9f433d86>] ? save_stack_trace+0x26/0x50
[ 25.305769] [<ffffffff9f76e426>] ? save_stack+0x46/0xd0
[ 25.305771] [<ffffffff9f76eae1>] ? kasan_slab_free+0x71/0xb0
[ 25.305774] [<ffffffff9f76c950>] ? kfree+0x80/0x140
[ 25.305777] [<ffffffff9fd1a205>] ? free_cpumask_var+0x15/0x20
[ 25.305780] [<ffffffff9f457b6e>] ? native_send_call_func_ipi+0x19e/0x3e0
[ 25.305783] [<ffffffff9f5d308f>] ? smp_call_function_many+0x51f/0x900
[ 25.305786] [<ffffffff9f5d3521>] ? on_each_cpu+0x31/0x50
[ 25.305789] [<ffffffff9f483219>] ? change_page_attr_set_clr+0x799/0xa60
[ 25.305792] [<ffffffff9f485a64>] ? set_memory_np+0x34/0x40
[ 25.305795] [<ffffffff9f4773d2>] ? free_init_pages+0x122/0x130
[ 25.305797] [<ffffffff9f4788ea>] ? mark_rodata_ro+0x1ca/0x230
[ 25.305800] [<ffffffffa0c6b114>] ? kernel_init+0x34/0x180
[ 25.305803] [<ffffffffa0c7c00f>] ? ret_from_fork+0x1f/0x40
[ 25.305805] [<ffffffff9f76ab2f>] ? deactivate_slab+0x34f/0x3c0
[ 25.305808] [<ffffffff9f76a08a>] ? alloc_debug_processing+0x5a/0x1b0
[ 25.305811] [<ffffffff9f76b06e>] ? ___slab_alloc.constprop.58+0x2de/0x420
[ 25.305814] [<ffffffff9fd1a0ce>] ? alloc_cpumask_var_node+0x5e/0xf0
[ 25.305817] [<ffffffff9f76e536>] ? kasan_unpoison_shadow+0x36/0x50
[ 25.305820] [<ffffffff9f76e536>] ? kasan_unpoison_shadow+0x36/0x50
[ 25.305823] [<ffffffff9fd4a3d8>] ? __bitmap_weight+0xd8/0x160
[ 25.305826] [<ffffffff9f76eaf7>] ? kasan_slab_free+0x87/0xb0
[ 25.305828] [<ffffffff9f76c950>] ? kfree+0x80/0x140
[ 25.305831] [<ffffffff9fd1a205>] ? free_cpumask_var+0x15/0x20
[ 25.305834] [<ffffffff9f457b6e>] ? native_send_call_func_ipi+0x19e/0x3e0
[ 25.305837] [<ffffffff9f4579d0>] ? native_smp_send_reschedule+0x110/0x110
[ 25.305840] [<ffffffff9fd59b59>] ? _find_next_bit+0xe9/0x120
[ 25.305843] [<ffffffff9fd59bb7>] ? find_next_bit+0x27/0x30
[ 25.305846] [<ffffffff9f5d2c08>] ? smp_call_function_many+0x98/0x900
[ 25.305849] [<ffffffff9f492948>] ? note_page+0xf48/0x1790
[ 25.305853] [<ffffffff9f493639>] ? ptdump_walk_pgd_level_core+0x4a9/0x710
[ 25.305856] [<ffffffff9f493190>] ? note_page+0x1790/0x1790
[ 25.305858] [<ffffffff9f485a64>] ? set_memory_np+0x34/0x40
[ 25.305862] [<ffffffff9f4938e7>] ? ptdump_walk_pgd_level_checkwx+0x17/0x20
[ 25.305865] [<ffffffff9f4788ef>] ? mark_rodata_ro+0x1cf/0x230
[ 25.305867] [<ffffffffa0c6b114>] ? kernel_init+0x34/0x180
[ 25.305870] [<ffffffffa0c7c00f>] ? ret_from_fork+0x1f/0x40
[ 25.305873] [<ffffffffa0c6b0e0>] ? rest_init+0xa0/0xa0
[ 25.614915] 3-...: (20987 ticks this GP) idle=943/140000000000001/0 softirq=146/146 fqs=5247
[ 25.617229] (t=21313 jiffies g=-222 c=-223 q=6)
[ 25.619552] Task dump for CPU 3:
[ 25.621852] swapper/0 R running task 0 1 0 0x00000008
[ 25.624211] ffffffffa1463020 000000003fbfb87f ffff8803acf87c98 ffffffff9f50d416
[ 25.626592] 0000000000000006 0000000000000010 0000000041b58ab3 ffffffffa12d8e95
[ 25.628989] ffffffff9f50d1f0 ffff8803acf87d30 ffffffff9fd59b59 0000000000000008
[ 25.631408] Call Trace:
[ 25.633840] <IRQ> [<ffffffff9f50d416>] sched_show_task+0x226/0x380
[ 25.636338] [<ffffffff9f50d1f0>] ? SyS_sched_rr_get_interval+0x250/0x250
[ 25.638860] [<ffffffff9fd59b59>] ? _find_next_bit+0xe9/0x120
[ 25.641381] [<ffffffff9f514f41>] dump_cpu_task+0x71/0x90
[ 25.643903] [<ffffffffa0c60a15>] rcu_dump_cpu_stacks+0x180/0x1a6
[ 25.646435] [<ffffffff9f594576>] rcu_check_callbacks+0x1706/0x17f0
[ 25.648966] [<ffffffff9f62662f>] ? __acct_update_integrals+0x21f/0x2a0
[ 25.651491] [<ffffffff9f59e6c0>] update_process_times+0x30/0x60
[ 25.654018] [<ffffffff9f5c8533>] tick_sched_timer+0xb3/0x420
[ 25.656541] [<ffffffff9f59edb3>] ? __remove_hrtimer+0x73/0x180
[ 25.659059] [<ffffffff9f5c8480>] ? tick_init_highres+0x20/0x20
[ 25.661571] [<ffffffff9f59f52e>] __hrtimer_run_queues+0x25e/0x4e0
[ 25.664128] [<ffffffff9f5a11dc>] hrtimer_interrupt+0x17c/0x450
[ 25.666664] [<ffffffffa0c7d8ef>] smp_trace_apic_timer_interrupt+0xdf/0x140
[ 25.669241] [<ffffffffa0c7d959>] smp_apic_timer_interrupt+0x9/0xb
[ 25.671793] [<ffffffffa0c7d19c>] apic_timer_interrupt+0x7c/0x90
[ 25.674336] <EOI> [<ffffffff9f492948>] ? note_page+0xf48/0x1790
[ 25.676946] [<ffffffff9f493639>] ptdump_walk_pgd_level_core+0x4a9/0x710
[ 25.679576] [<ffffffff9f493190>] ? note_page+0x1790/0x1790
[ 25.682197] [<ffffffff9f485a64>] ? set_memory_np+0x34/0x40
[ 25.684813] [<ffffffff9f4938e7>] ptdump_walk_pgd_level_checkwx+0x17/0x20
[ 25.687463] [<ffffffff9f4788ef>] mark_rodata_ro+0x1cf/0x230
[ 25.690119] [<ffffffffa0c6b114>] kernel_init+0x34/0x180
[ 25.692779] [<ffffffffa0c7c00f>] ret_from_fork+0x1f/0x40
[ 25.695445] [<ffffffffa0c6b0e0>] ? rest_init+0xa0/0xa0
[ 32.049408] CPA self-test:
[ 32.253030] 4k 4187648 large 0 gb 0 x 1837[ffff880000099000-ffff880100000000] miss 268800
[ 32.457951] 4k 4187648 large 0 gb 0 x 1837[ffff880000099000-ffff880100000000] miss 268800
[ 32.662578] 4k 4187648 large 0 gb 0 x 1837[ffff880000099000-ffff880100000000] miss 268800
[ 32.665359] ok.