Re: uninterruptible sleep lockups

From: Valdis . Kletnieks
Date: Mon Feb 21 2005 - 15:55:43 EST


On Mon, 21 Feb 2005 15:24:21 EST, Anthony DiSante said:
> Or maybe it SHOULD have killed your process, in some "proper" way that
> prevents any outstanding I/O requests from coming in days later and breaking
> things. Again, I'm no kernel hacker, but if an I/O request takes *3 days*,
> isn't that an indication of a bug or of faulty hardware perhaps?

Right. And if you're an automated program trying to clean up after a *bug*,
what do you do? It's quite likely that somebody's borked a lock - in which
case it may even be that the "hung" process is the victim rather than the
culprit, and breaking the lock will just make things worse. Similar issues
apply to *all* of the resources the wedged process has attached to it.

When these things get posted on lkml, it almost always involves quite a bit
of code introspection and scratching of heads before we figure out how the
system *got* its figurative head wedged into that crevice. Until you
figure out how it *got* there, a safe cleanup is in general impossible. And
we haven't seen yet the automatic program that can introspect the code to that
detail - even the Stanford automated checker and sparse and the like are quite the
impressive pieces of work.

> > It's been covered before, look in the lkml archives for details.
>
> Thanks, I'll do that. But could you give me a more specific pointer?

See the thread rooted here:

Date: Wed, 03 Nov 2004 07:51:39 -0500
From: Gene Heskett <gene.heskett@xxxxxxxxxxx>
Subject: is killing zombies possible w/o a reboot?
Sender: linux-kernel-owner@xxxxxxxxxxxxxxx
To: linux-kernel@xxxxxxxxxxxxxxx
Reply-to: gene.heskett@xxxxxxxxxxx
Message-id: <200411030751.39578.gene.heskett@xxxxxxxxxxx>

Attachment: pgp00000.pgp
Description: PGP signature