Re: uninterruptible sleep lockups

From: Anthony DiSante
Date: Mon Feb 21 2005 - 15:28:28 EST


Valdis.Kletnieks@xxxxxx wrote:
It seems like this problem is always going to exist, because some hardware and some drivers will always be buggy. So shouldn't we have some sort of watchdog higher up in the kernel, that watches for hung processes like this and kills them?

And said watchdog would clean up the mess, how, exactly? There's lots of sticky
issues having to do with breaking locks and possibly still-pending I/O (I once had
a tape drive complete an I/O 3 *days* after the request was sent - good thing no
watchdog killed the process and deallocated the memory that I/O landed in ;)

I'm not a kernel programmer, so I don't have the answers to any of that. I guess I was thinking that there'd be some way to distinguish between processes that are truly stuck -- that is, never coming back -- and processes like yours, that are taking a long time but still working.

Or maybe it SHOULD have killed your process, in some "proper" way that prevents any outstanding I/O requests from coming in days later and breaking things. Again, I'm no kernel hacker, but if an I/O request takes *3 days*, isn't that an indication of a bug or of faulty hardware perhaps?

It's been covered before, look in the lkml archives for details.

Thanks, I'll do that. But could you give me a more specific pointer? Searching lkml for "uninterruptible" returns ~2000 results.

Thanks,
Anthony DiSante
http://nodivisions.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/