Re: linux 4.2.4 rcu_sched rolls over and barfs after debugger exits

From: Jeffrey Merkey
Date: Mon Oct 26 2015 - 04:00:23 EST


Also, please note in the trace for some strange reason, the floppy
drive activates after the rcu_sched errors happen -- fucking wierd
shit. I fixed the problem in the debugger patch but seems like to me
it should not generate a crash report when time is frozen for that
kernel instance just because the hardware clock runs over.

Jeff

On 10/25/15, Jeffrey Merkey <jeffmerkey@xxxxxxxxx> wrote:
> After using the mdb kernel debugger then exiting, the rcu_sched, due
> to its own internal timers, rolls over and crashes when it does not
> get the timeout window it likes. Not caused by memory corruption,
> just caused by the debugger holding the system suspended then when the
> system is allowed to run rcu_sched rolls over and dies.
>
> There are several things happening here -- lots of bugs linus ...
>
> Jeff
>
> sysrq: SysRq : MDB
> INFO: rcu_sched detected stalls on CPUs/tasks:
> (detected by 0, t=41279 jiffies, g=14721, c=14720, q=5)
> All QSes seen, last rcu_sched kthread activity 41279
> (-165477--206756), jiffies_till_next_fqs=3, root ->qsmask 0x0
> NetworkManager R running 0 1703 1 0x00000080
> c0bb6a28 c046d763 c0a895d9 00000000 000006a7 00000001 00000080 f64c1140
> c0b535c0 00003981 c04a5126 c0a823a8 c0b53a91 0000a13f fffd799b fffcd85c
> 00000003 00000000 00000096 00000000 00003981 3b9aca00 00003981 00003980
> Call Trace:
> [<c046d763>] ? sched_show_task+0xb3/0x120
> [<c04a5126>] ? print_other_cpu_stall+0x276/0x2c0
> [<c04a52e0>] ? __rcu_pending+0x170/0x210
> [<c04a632f>] ? rcu_check_callbacks+0xbf/0x1a0
> [<c04a8f48>] ? update_process_times+0x28/0x50
> [<c04ba943>] ? tick_sched_handle+0x33/0x70
> [<c04baa97>] ? tick_sched_timer+0x47/0xa0
> [<c04aaefa>] ? __remove_hrtimer+0x4a/0x90
> [<c04ab656>] ? __run_hrtimer+0x66/0x180
> [<c04baa50>] ? tick_nohz_handler+0xd0/0xd0
> [<c055f5e5>] ? __vfs_read+0xc5/0xf0
> [<c04ab7f8>] ? __hrtimer_run_queues+0x88/0xc0
> [<c04ab995>] ? hrtimer_interrupt+0x85/0x170
> [<c0436746>] ? local_apic_timer_interrupt+0x26/0x50
> [<c0451655>] ? irq_enter+0x5/0x50
> [<c043679b>] ? smp_apic_timer_interrupt+0x2b/0x50
> [<c090468d>] ? apic_timer_interrupt+0x2d/0x34
> [<c0900000>] ? firmware_map_add_hotplug+0x45/0x141
> rcu_sched kthread starved for 41279 jiffies! g14721 c14720 f0x2
> fuse init (API version 7.23)
> blk_update_request: I/O error, dev fd0, sector 0
> floppy: error -5 while reading block 0
> blk_update_request: I/O error, dev fd0, sector 0
> floppy: error -5 while reading block 0
> sysrq: SysRq : MDB
> INFO: rcu_sched detected stalls on CPUs/tasks:
> (detected by 0, t=21939 jiffies, g=17972, c=17971, q=3)
> All QSes seen, last rcu_sched kthread activity 21939
> (-124010--145949), jiffies_till_next_fqs=3, root ->qsmask 0x0
> rtkit-daemon R running 0 2878 1 0x00000080
> c0bb6a28 c046d763 c0a895d9 00000000 00000b3e 00000001 00000080 f64c1140
> c0b535c0 00004634 c04a5126 c0a823a8 c0b53a91 000055b3 fffe1b96 fffdc5e3
> 00000003 00000000 00000086 00000000 00004634 f69ec5cc 00004634 00004633
> Call Trace:
> [<c046d763>] ? sched_show_task+0xb3/0x120
> [<c04a5126>] ? print_other_cpu_stall+0x276/0x2c0
> [<c04a52e0>] ? __rcu_pending+0x170/0x210
> [<c04a632f>] ? rcu_check_callbacks+0xbf/0x1a0
> [<c04a8f48>] ? update_process_times+0x28/0x50
> [<c04ba943>] ? tick_sched_handle+0x33/0x70
> [<c04baa97>] ? tick_sched_timer+0x47/0xa0
> [<c04aaefa>] ? __remove_hrtimer+0x4a/0x90
> [<c04ab656>] ? __run_hrtimer+0x66/0x180
> [<c04baa50>] ? tick_nohz_handler+0xd0/0xd0
> [<c083a719>] ? __kmalloc_reserve+0x29/0x80
> [<c04ab7f8>] ? __hrtimer_run_queues+0x88/0xc0
> [<c04ab995>] ? hrtimer_interrupt+0x85/0x170
> [<c0486507>] ? __wake_up_common+0x47/0x70
> [<c0436746>] ? local_apic_timer_interrupt+0x26/0x50
> [<c0451655>] ? irq_enter+0x5/0x50
> [<c043679b>] ? smp_apic_timer_interrupt+0x2b/0x50
> [<c090468d>] ? apic_timer_interrupt+0x2d/0x34
> [<c05689b0>] ? legitimize_path+0x50/0x50
> [<c056b8e5>] ? lookup_fast+0x155/0x2d0
> [<c0568fbd>] ? generic_permission+0xcd/0x100
> [<c056ba9a>] ? walk_component+0x3a/0x1f0
> [<c08334f5>] ? SYSC_sendto+0x125/0x150
> [<c056d1a6>] ? path_lookupat+0x56/0xf0
> [<c056d48b>] ? filename_lookup+0x8b/0x150
> [<f9cd02c2>] ? nl80211_send_bss.clone.4+0xe2/0x490 [cfg80211]
> [<c056946e>] ? getname_flags+0x3e/0x1b0
> [<c056948d>] ? getname_flags+0x5d/0x1b0
> [<c05641fe>] ? vfs_fstatat+0x4e/0xa0
> [<c0564308>] ? vfs_stat+0x18/0x20
> [<c056464a>] ? SyS_stat64+0x1a/0x40
> [<c0834535>] ? SyS_socketcall+0x235/0x300
> [<c04da94c>] ? __audit_syscall_entry+0x9c/0x100
> [<c0903b48>] ? sysenter_do_call+0x12/0x12
> rcu_sched kthread starved for 21939 jiffies! g17972 c17971 f0x2
> [root@aya ~]#
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/