Re: scheduler went mad?

From: Valdis.Kletnieks@vt.edu
Date: Thu Apr 12 2001 - 09:57:08 EST


I've seen the same scenario about 2-3 times a week. kswapd and one or
more processes all CPU bound, totalling to 100%. I've had 'esdplay' hung
on several occasions, and 2-3 times it's been xscreensaver (3.29) hung.
The 'hung' processes are consistently immune to kill -9, even as root, which
indicates to me that they're hung inside a kernel call or something.

Sometimes, something *else* will exit, and everything will 'break loose'
and return to normal after a minute or so.

It *may* not be related, but I also have a lot of this in 'dmesg':

__alloc_pages: 4-order allocation failed.
__alloc_pages: 3-order allocation failed.
i810_audio: DMA overrun on send

There was a recent posting re: the i810_audio driver amounting to "I've got
one bug to fix and then I'll put up a patch" for the 'dma overrun' message.
__alloc_pages doesn't give much information on who its caller was, so
that's somewhat of a dead end...

In page_alloc.c, __alloc_pages() has a 'goto try_again;' which will
cause it to loop around and try to get more memory. I'm wondering if
the "hung" process is entering __alloc_pages(), and gets wedged in the
'try_again' loop - which has a call to wakeup_kswapd() inside it, which
would explain the high context-switch rate. I'm not clear on how kswapd
can end up getting stuck and failing to free up something - unless it ends
up calling __alloc_pages itself indirectly and the PF_MEMALLOC bit isn't
enough to get it the memory it needs, causing a deadlock/loop between
kswapd and __alloc_pages/wakeup_kswapd().

Unfortunately, I've just exhausted my ability to debug this one here.. ;)

I'm running the 2.4.3 kernel, with the following patches:

Reiserfs: 2.4.3-3.6.25.quota.bz2
linux-2.4.3-knfsd-6.g.patch.gz
linux-2.4.3-reiserfs-20010327.patch.bz2

IPv6: linux24-2.4.3-usagi-20010406.patch.gz
Crypto: patch-int-2.4.3.1

am using ReiserFS-on-LVM for basically all filesystems, if that matters...

-- 
				Valdis Kletnieks
				Operating Systems Analyst
				Virginia Tech


- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Apr 15 2001 - 21:00:18 EST