ext3 deadlock or Re: [2.6.23] tasks stuck in running state?

From: Guennadi Liakhovetski
Date: Sun Oct 21 2007 - 16:11:25 EST


On Fri, 19 Oct 2007, Ray Lee wrote:

> On 10/19/07, Jeff Garzik <jeff@xxxxxxxxxx> wrote:
> > On my main devel box, vanilla 2.6.23 on x86-64/Fedora-7, I'm seeing a
> > certain behavior at least once a day. I'll start a kernel build (make
> > -sj5 on this box), and it will "hang" in the following way:
> >
> > > 31003 ? S 0:04 sshd: jgarzik@pts/0
> > > 31004 pts/0 Ss 0:02 \_ -bash
> > > 8280 pts/0 S+ 0:00 \_ make ARCH=i386 -sj4
> > > 8690 pts/0 Z+ 0:00 \_ [rm] <defunct>
> > > 8691 pts/0 S+ 0:00 \_ /bin/sh -c cat include/config/kernel.release 2> /dev/null
> > > 8692 pts/0 R+ 6:12 \_ cat include/config/kernel.release
> >
> > Specifically, the symptom is a process, often a simple one like cat(1)
> > or rm(1) or somewhere in check-headers, will stay in the running state,
> > accumulating CPU time.
> >
> > If I Ctrl-C the build, and start over, the build will normally -not- get
> > stuck at the same point, but proceed to chew through one of a bazillion
> > allmodconfig builds.
>
> I *think* I'm seeing this with firefox under 2.6.23-rc6. I tried a
> `killall -SIGSTOP firefox; killall -SIGCONT firefox` and when I looked
> back it was back to life again, but that may have been a fluke.
> Regardless, try that the next time it happens?

Don't know if that's the same problem as above, but a few minutes ago my
mail-server locked down completely. First pine froze, then more processes
started freezing, then the system became unusable, ssh logins got stuck,
USB- and ps/2 keyboards. I managed to get a trace with the "w" sysrq:

SysRq : Show Blocked State
task PC stack pid father
syslogd D c01275a3 0 2818 1
e8a1fe4c 00000086 e8a1fe4c c01275a3 f7e9a200 00000282 e8a1fe5c 00b5ce60
c1b05cc0 e8a1fe7c c030c8e7 e8a1ff30 c8d553c4 c0407348 c1b58b18 00b5ce60
c01277d0 c1b1ba90 c0407040 000000da d5f806c0 e8a1fe84 c030c974 e8a1feb4
Call Trace:
[<c030c8e7>] schedule_timeout+0x47/0xc0
[<c030c974>] schedule_timeout_uninterruptible+0x14/0x20
[<c01b94cb>] journal_stop+0xcb/0x270
[<c01ba1fd>] journal_force_commit+0x1d/0x30
[<c01b2265>] ext3_force_commit+0x25/0x30
[<c01ac7dc>] ext3_write_inode+0x2c/0x40
[<c0183f8b>] __writeback_single_inode+0x30b/0x3e0
[<c01849b4>] sync_inode+0x24/0x60
[<c01a8d42>] ext3_sync_file+0xc2/0xd0
[<c0187340>] do_fsync+0x60/0xa0
[<c01873a8>] __do_fsync+0x28/0x40
[<c01873ed>] sys_fsync+0xd/0x10
[<c010424e>] sysenter_past_esp+0x5f/0x85
=======================
pine D c01275a3 0 6910 6243
d1f55e4c 00200082 c03c0e40 c01275a3 f7e42c80 00200282 d1f55e5c 00b5ce60
c1b05cc0 d1f55e7c c030c8e7 d1f55f30 c8d553d8 c1b58b18 c90a9e5c 00b5ce60
c01277d0 d8ae8a90 c0407040 000000c3 d5f806c0 d1f55e84 c030c974 d1f55eb4
Call Trace:
[<c030c8e7>] schedule_timeout+0x47/0xc0
[<c030c974>] schedule_timeout_uninterruptible+0x14/0x20
[<c01b94cb>] journal_stop+0xcb/0x270
[<c01ba1fd>] journal_force_commit+0x1d/0x30
[<c01b2265>] ext3_force_commit+0x25/0x30
[<c01ac7dc>] ext3_write_inode+0x2c/0x40
[<c0183f8b>] __writeback_single_inode+0x30b/0x3e0
[<c01849b4>] sync_inode+0x24/0x60
[<c01a8d42>] ext3_sync_file+0xc2/0xd0
[<c0187340>] do_fsync+0x60/0xa0
[<c01873a8>] __do_fsync+0x28/0x40
[<c01873ed>] sys_fsync+0xd/0x10
[<c010424e>] sysenter_past_esp+0x5f/0x85
=======================
sendmail D c01275a3 0 7448 7446
c90a9e4c 00000082 c03c0e40 c01275a3 f7e00580 00000282 c90a9e5c 00b5ce60
c1b05cc0 c90a9e7c c030c8e7 c90a9f30 c8d553ec d1f55e5c c0407348 00b5ce60
c01277d0 d8ae8030 c0407040 0000004b d5f806c0 c90a9e84 c030c974 c90a9eb4
Call Trace:
[<c030c8e7>] schedule_timeout+0x47/0xc0
[<c030c974>] schedule_timeout_uninterruptible+0x14/0x20
[<c01b94cb>] journal_stop+0xcb/0x270
[<c01ba1fd>] journal_force_commit+0x1d/0x30
[<c01b2265>] ext3_force_commit+0x25/0x30
[<c01ac7dc>] ext3_write_inode+0x2c/0x40
[<c0183f8b>] __writeback_single_inode+0x30b/0x3e0
[<c01849b4>] sync_inode+0x24/0x60
[<c01a8d42>] ext3_sync_file+0xc2/0xd0
[<c0187340>] do_fsync+0x60/0xa0
[<c01873a8>] __do_fsync+0x28/0x40
[<c01873ed>] sys_fsync+0xd/0x10
[<c010424e>] sysenter_past_esp+0x5f/0x85
=======================

now you see why I wrote "ext3 deadlock." It's y VIA C7 system, running
2.6.23-rc9-g804b3f9a, no problems since 5 October, when the kernel has
been built. No Oops / warnings in dmesg. Or has this been fixed since
23-rc9?

Thanks
Guennadi
---
Guennadi Liakhovetski
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/