Uninterruptable sleep D state (was Re: Unkillable rm -rf on

F Harvell (spam@fts.net)
Tue, 17 Feb 1998 10:52:57 -0500


On Tue, 17 Feb 1998 15:00:47 +0100, Mathias Froehlich wrote:
>
> Chan Shih-Ping wrote:
>
> > Sometimes running rm -rf on a directory causes an unkillable
> > rm process. It doesn't consume any CPU time but never returns.
> > The system cannot be shutdown cleanly after this.
> >
>
> I have that problem too! Sometimes update hangs with D state
> (the state field in top), this causes very exiting e2fsck's!
> Or moves or copy's of big files end in such an unkillable process.
>
> I have a ASUS P97LX-DS with two PII installed, I use the onboard
> aic7880. I saw this problem with 2.1.8[56] (2.1.87 untested up to
> now). The problem occures independent of the pirq=0 kernel argument.

I am having this problem as well. Since the processes are not
killable, they are locking up the resources that they use. I am able
to easily reproduce my problems by doing a dump to tape and an fsck
at the same time. (I haven't gotten a good backup in about a week.
:^( ) I also have seen update go into the D state. The final result
of that was a system crash.

I'm not sure where to start looking. Perhaps we can identify the
commonalities in our hardware/kernel config. I am willing to be a
clearing house for information and try to find some correlations. I
will publish a summary.

First, some observations:

1) I first saw processes locking up in the D state when trying to
make an iso9660 image back on kernel 2.1.36 with a RedHat 4.2 system.
(i.e., the root problem may be old).

2) It appears that the problem is occurring when moving a
significant amount of data. (i.e., dumps, mkisofs, fsck, swapping? in
update?).

3) The problems appear to be getting worse as performance gets
better. (i.e., with Ingo's new IO-APIC code, I'm now seeing it
regularly in dumps, etc.)

4) The problem does not appear to be associated with quotas or BSD
accounting. I have tried 2.1.8[56] kernels with and w/o quotas and
BSD accounting.

5) The problem was not fixed by Ingo's miniPatch for 2.1.86. (I
booted 2.1.87 but was scared by the immediate shmem oops.)

As for my system, I have a Tyan 1668D with dual PPro 150, A Buslogic
958 with two Micropolis Tomahawk drives (with 10 md striped
partitions, and two, pri=1 swap partitions), a Buslogic 946 with a
SCSI Travan NS8 tape drive, a tulip based ethernet card, two
Millennium cards, and a SB 16.

Questions for anyone else experiencing processes getting stuck in
the D (uninterruptible sleep) state:

a) Is your system SCSI only? What controller/driver?

b) Is your system drive IDE? Do you have any SCSI devices?

c) Are you using md? What modes?

d) Do you have more than one swap partition? Is it equal priority?

e) What process(es) do you see hanging?

f) What type of ethernet controller/driver?

g) Are you running X windows? Which X server?

Also, I'm not all that fluent with the linux debugging tools, If
anyone has any suggestions for further elucidating useful information
from the stuck processes, please let me know.

-- 
Mr. F Harvell                          Phone:407 696-4340
FTS International, Systems Division    Phone:407 399-0342 (cell)
3498 Buffam Place                        Fax:407 696-4244
Casselberry, FL 32707                 mailto:fharvell@fts.net

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu