[2.6.23] tasks stuck in running state?

From: Jeff Garzik
Date: Fri Oct 19 2007 - 17:40:03 EST


On my main devel box, vanilla 2.6.23 on x86-64/Fedora-7, I'm seeing a certain behavior at least once a day. I'll start a kernel build (make -sj5 on this box), and it will "hang" in the following way:

31003 ? S 0:04 sshd: jgarzik@pts/0
31004 pts/0 Ss 0:02 \_ -bash
8280 pts/0 S+ 0:00 \_ make ARCH=i386 -sj4
8690 pts/0 Z+ 0:00 \_ [rm] <defunct>
8691 pts/0 S+ 0:00 \_ /bin/sh -c cat include/config/kernel.release 2> /dev/null
8692 pts/0 R+ 6:12 \_ cat include/config/kernel.release

Specifically, the symptom is a process, often a simple one like cat(1) or rm(1) or somewhere in check-headers, will stay in the running state, accumulating CPU time.

If I Ctrl-C the build, and start over, the build will normally -not- get stuck at the same point, but proceed to chew through one of a bazillion allmodconfig builds.

I also see this occasionally on my main workstation (also 2.6.23/x86-64/Fedora-7), though not as frequently.

This is a new behavior since the new scheduler was merged... I think.

Nothing more concrete to report at this time. I cannot easily reproduce the behavior, as it happens [apparently] randomly sometime during the day. Generally, the files these programs are dealing with are -always- in the pagecache, if that makes any difference.

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/