Re: [patch] softlockup: fix false positives on nohz if CPU is 100%idle for more than 60 seconds

From: David Miller
Date: Wed Apr 23 2008 - 19:23:20 EST


From: Ingo Molnar <mingo@xxxxxxx>
Date: Wed, 23 Apr 2008 15:36:56 +0200

> as a temporary workaround please try the patch below, until we can
> reproduce and fix the bug.

Yeah, if you basically turn off the code paths, that particular set of
problems goes away :-/

So then we're at the next bug, cpus getting wedged in the group
aggregate code.

I'll try Peter's patches which were posted today.

[ 760.218048] BUG: soft lockup - CPU#5 stuck for 61s! [swapper:0]
[ 760.218292] TSTATE: 0000000080001603 TPC: 000000000054e0c0 TNPC: 000000000054e0c4 Y: 00000000 Not tainted
[ 760.218325] TPC: <find_next_bit+0xe4/0x11c>
[ 760.218336] g0: 0000000000009000 g1: 0000000000000000 g2: ffffffffffffffff g3: 0000000000000030
[ 760.218352] g4: fffff803ff0d5880 g5: fffff80007c8a000 g6: fffff803ff0ec000 g7: 00000000007bb6d0
[ 760.218368] o0: 000000000000fff0 o1: 0000000000000040 o2: 0000000000000034 o3: 0000000000000000
[ 760.218383] o4: 0000000100009332 o5: 0000000000000000 sp: fffff803ff0eee21 ret_pc: 000000000054de08
[ 760.218402] RPC: <__next_cpu+0x18/0x2c>
[ 760.218413] l0: 00000000007f0000 l1: 0000009980001602 l2: 0000000000455d2c l3: 0000000000000400
[ 760.218428] l4: 0000000000000000 l5: 0000000000000002 l6: 0000000000000000 l7: 0000000000000008
[ 760.218443] i0: 0000000000000033 i1: 00000000007bb6c8 i2: 0000000000000038 i3: fffff803f73bf100
[ 760.218459] i4: 0000000000845000 i5: 0000000000000401 i6: fffff803ff0eeee1 i7: 0000000000455d48
[ 760.218487] I7: <aggregate_group_shares+0x10c/0x16c>
[ 823.716459] INFO: task collect2:4106 blocked for more than 120 seconds.
[ 823.716680] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 823.716815] collect2 D 00000000006b4a80 0 4106 4105
[ 823.716831] Call Trace:
[ 823.716839] [00000000006b4c40] schedule_timeout+0x20/0xa4
[ 823.716859] [00000000006b4a80] wait_for_common+0xf4/0x184
[ 823.716875] [000000000045f2cc] do_fork+0x1dc/0x234
[ 823.716894] [0000000000406214] linux_sparc_syscall32+0x3c/0x40
[ 823.716917] [0000000000023f50] 0x23f58

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/