Re: uninterruptible sleep lockups

From: Valdis . Kletnieks
Date: Mon Feb 21 2005 - 19:37:40 EST


On Mon, 21 Feb 2005 19:06:23 EST, Anthony DiSante said:

> The driver code for my devices has "been given" to me as part of the kernel.
> Any of a handful of USB devices has caused permanent D states, as has a
> CDROM and a NIC. I guess I'll need to start debugging all of these drivers.
> When something goes into permanent D sleep, what should I do to start
> tracking down the problem? Aside from obvious stuff like dmesg and checking
> /var/log/messages, neither of which ever seems to say anything useful when
> this happens.

Alt-Sysrq-T and provide the tracebacks for the wedged process(es). That,
and the other info suggested in the linux/REPORTING-BUGS file will go a long
way to actually getting things fixed.

> > Kernel bugs are not acceptable.
>
> That's a nice-sounding ideal, but the truth is that kernel bugs exist and
> are not uncommon.

Yes, but how do you write a "unwedge the hung process" daemon, given that said
daemon needs to know what the bug the process hit was in order to properly
unwedge it (at which point it's easier just to *fix* the frikking bug), and
also given that said unwedger will itself have bugs.

If you need further convincing, look at the rock-solid OOM-killer code, which has
a lot of the same issues as a zombie-unwedger - and all *it* has to do is deliver
a 'kill -9' to the right process. It doesn't have to unsnarl memory allocations
and locks and semaphores and PCI resources and all the rest....

Attachment: pgp00000.pgp
Description: PGP signature