Re: rcu_sched_state detected stall on CPU 0, 3.0-rc2

From: Andy Isaacson
Date: Sun Jun 12 2011 - 19:56:44 EST


Let's CC netdev and linux-pm since this is obviously a suspend issue,
and may have something to do with ethtool.

On Sun, Jun 12, 2011 at 04:11:43PM -0700, Andy Isaacson wrote:
> On Sun, Jun 12, 2011 at 12:58:56PM -0700, Andy Isaacson wrote:
> > My Thinkpad x201s threw some errors (?) a few minutes after resuming
> > from suspend-to-ram this morning.
> >
> > [56415.672140] INFO: rcu_sched_state detected stall on CPU 0 (t=15000 jiffies)
> >
> > Nothing jumps out of the backtraces at me. Full dmesg and config
> > attached. This was my first StR since upgrading from 2.6.39, let's see
> > if it fails again when I suspend after sending this email. :)
>
> I haven't had a fully successful StR cycle yet (in 5 tries), although I
> can't pin them all on RCU. On try 2 it hung completely about 10 seconds
> after I unlocked the screensaver, on try 3 it came back to a black
> console, and on try 4 it didn't suspend at all (blinking moon LED but
> battery LED and CPU fan still on).

Of course now that I'm trying to debug, I am seeing many successful
suspend-resume cycles. I don't see any signs of difference between the
cases that hung and the cases that are now succeeding.

CCing netdev, because I suspend by running pm-suspend, and in at least
one failure, an ethtool running under pm-suspend seemed to be the
problem:

root 11558 pts/8 S+ \_ /bin/sh /usr/lib/pm-utils/sleep.d/00powers
root 11559 pts/8 S+ \_ /bin/sh /usr/sbin/pm-powersave
root 11576 pts/8 S+ \_ /bin/sh /usr/lib/pm-utils/power.d/
root 11577 pts/8 D+ \_ ethtool -s eth0 wol g

many processes were stuck in D:

USER PID VSZ RSS STAT START COMMAND
root 11493 0 0 D 16:11 \_ [kworker/u:15]
nobody 1707 21472 992 D 14:31 dnsmasq --strict-order --bind-interfaces --pid-file=/var/run/libvirt/network/default.pid --conf-file= --except-interface lo --listen-address 192.168.122.1 --dhcp-range 192.168.122.2,192.168.122.254 --dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases --dhcp-lease-max=253 --dhcp-no-override
adi 11606 41004 2424 D+ 16:13 | \_ ssh hex
root 11577 4092 324 D+ 16:11 | \_ ethtool -s eth0 wol g
root 11595 22108 892 D+ 16:12 | \_ sudo cat /proc/11577/stack
root 11604 22108 900 D+ 16:13 | \_ sudo cat /proc/11577/stack

==> /proc/11577/wchan <==
synchronize_sched

-andy

Attachment: trace.gz
Description: Binary data

Attachment: config-trim.gz
Description: Binary data