Re: RCU stall when using function_graph

From: Paul E. McKenney
Date: Wed Aug 02 2017 - 08:49:34 EST


On Tue, Aug 01, 2017 at 03:04:05PM -0700, Paul E. McKenney wrote:
> > Hi Paul,
> >
> > I have been trying to set the function_graph tracer for ftrace and each time I
> > get a CPU stall.
> >
> > How to reproduce:
> > -----------------
> >
> > echo function_graph > /sys/kernel/debug/tracing/current_tracer
> >
> > This error appears with v4.13-rc3 and v4.12-rc6.
> >
> > Is it something already reported ?
>
> I have seen this sort of thing, but only when actually dumping the trace
> out, and I though those got fixed. You are seeing this just accumulating
> the trace?
>
> These RCU CPU stall warnings usually occur when something grabs hold of
> a CPU for too long, as in 21 seconds or so. One way that they can happen
> is excessive lock contention, another is having the kernel run through
> too much data at one shot.
>
> Adding Steven Rostedt on CC for his thoughts.

And Peter Zijlstra suggested this in response to a similar splat:

https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=locking/core&id=15377ef4fe0c86eb7fa1099575b2e86357d99e42

Thanx, Paul

> > [ ... ]
> >
> > [ 472.934986] INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [ 472.941302] 0-...: (3 GPs behind) idle=d1a/2/0 softirq=2556/2557 fqs=3195
> > [ 472.948982] (detected by 3, t=6502 jiffies, g=549, c=548, q=10)
> > [ 472.955180] Task dump for CPU 0:
> > [ 472.958513] swapper/0 R running task 0 0 0 0x00000002
> > [ 550.954786] INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [ 550.960657] 0-...: (3 GPs behind) idle=d1a/2/0 softirq=2556/2557 fqs=12926
> > [ 550.967897] (detected by 3, t=26007 jiffies, g=549, c=548, q=27)
> > [ 550.974181] Task dump for CPU 0:
> > [ 550.977513] swapper/0 R running task 0 0 0 0x00000002
> > [ 628.974788] INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [ 628.980659] 0-...: (3 GPs behind) idle=d1a/2/0 softirq=2556/2557 fqs=22666
> > [ 628.987900] (detected by 3, t=45512 jiffies, g=549, c=548, q=43)
> > [ 628.994184] Task dump for CPU 0:
> > [ 628.997518] swapper/0 R running task 0 0 0 0x00000002
> > [ 706.994787] INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [ 707.000658] 0-...: (3 GPs behind) idle=d1a/2/0 softirq=2556/2557 fqs=31785
> > [ 707.007899] (detected by 3, t=65017 jiffies, g=549, c=548, q=74)
> > [ 707.014183] Task dump for CPU 0:
> > [ 707.017516] swapper/0 R running task 0 0 0 0x00000002
> > [ 707.024902] rcu_preempt kthread starved for 1261 jiffies! g549 c548 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
> > [ 707.034548] rcu_preempt S 0 8 2 0x00000000
> > [ 726.049583] INFO: task jbd2/mmcblk1p2-:1120 blocked for more than 120 seconds.
> > [ 726.061772] Not tainted 4.13.0-rc3-ktest-linaro-hikey #202
> > [ 726.074294] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [ 726.089027] jbd2/mmcblk1p2- D 0 1120 2 0x00000000
> > [ 726.126737] INFO: task rs:main Q:Reg:2272 blocked for more than 120 seconds.
> > [ 726.140484] Not tainted 4.13.0-rc3-ktest-linaro-hikey #202
> > [ 726.154515] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [ 726.170140] rs:main Q:Reg D 0 2272 1 0x00000000
> > [ 726.205626]
> > [ 726.205626] Showing all locks held in the system:
> > [ 726.272952]
> > [ 726.278194] =============================================
> > [ 726.278194]
> > [ 785.014779] INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [ 785.020658] 0-...: (3 GPs behind) idle=d1a/2/0 softirq=2556/2557 fqs=41387
> > [ 785.027905] (detected by 3, t=84524 jiffies, g=549, c=548, q=94)
> > [ 785.034194] Task dump for CPU 0:
> > [ 785.037532] swapper/0 R running task 0 0 0 0x00000002
> > [ 846.879753] INFO: task jbd2/mmcblk1p2-:1120 blocked for more than 120 seconds.
> > [ 846.892966] Not tainted 4.13.0-rc3-ktest-linaro-hikey #202
> > [ 846.909597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [ 846.926726] jbd2/mmcblk1p2- D 0 1120 2 0x00000000
> > [ 846.981356] INFO: task rs:main Q:Reg:2272 blocked for more than 120 seconds.
> > [ 847.001593] Not tainted 4.13.0-rc3-ktest-linaro-hikey #202
> > [ 847.017642] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [ 847.034135] rs:main Q:Reg D 0 2272 1 0x00000000
> > [ 847.077209]
> > [ 847.077209] Showing all locks held in the system:
> > [ 847.180736]
> > [ 847.186753] =============================================
> > [ 847.186753]
> > [ 863.034778] INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [ 863.040660] 0-...: (3 GPs behind) idle=d1a/2/0 softirq=2556/2557 fqs=50881
> > [ 863.047907] (detected by 4, t=104030 jiffies, g=549, c=548, q=114)
> > [ 863.054373] Task dump for CPU 0:
> > [ 863.057711] swapper/0 R running task 0 0 0 0x00000002
> > [ 941.054780] INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [ 941.060659] 0-...: (3 GPs behind) idle=d1a/2/0 softirq=2556/2557 fqs=59726
> > [ 941.067903] (detected by 7, t=123535 jiffies, g=549, c=548, q=303)
> > [ 941.074370] Task dump for CPU 0:
> > [ 941.077708] swapper/0 R running task 0 0 0 0x00000002
> > [ 967.711682] INFO: task jbd2/mmcblk1p2-:1120 blocked for more than 120 seconds.
> > [ 967.725731] Not tainted 4.13.0-rc3-ktest-linaro-hikey #202
> > [ 967.743347] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [ 967.765658] jbd2/mmcblk1p2- D 0 1120 2 0x00000000
> > [ 967.810595] INFO: task rs:main Q:Reg:2272 blocked for more than 120 seconds.
> > [ 967.825360] Not tainted 4.13.0-rc3-ktest-linaro-hikey #202
> > [ 967.838595] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [ 967.854460] rs:main Q:Reg D 0 2272 1 0x00000000
> > [ 967.897804]
> > [ 967.897804] Showing all locks held in the system:
> > [ 967.997067]
> > [ 968.002570] =============================================
> >
> >
> > [ ... ]
> >
> > etc ...