Re: is killing zombies possible w/o a reboot?

From: Elladan
Date: Thu Nov 04 2004 - 21:42:23 EST


On Thu, Nov 04, 2004 at 08:39:34AM +0200, Denis Vlasenko wrote:
> On Thursday 04 November 2004 01:33, Russell Miller wrote:
> > On Wednesday 03 November 2004 17:03, Doug McNaught wrote:
> >
> > > It was already mentioned in this thread that the bookkeeping required
> > > to clean up properly from such an abort would add a lot of overhead
> > > and slow down the normal, non-buggy case.
> > >
> > I am going to continue pursuing this at the risk of making a bigger fool of
> > myself than I already am, but I want to make sure that I understand the
> > issues - and I did read the message you are referring to.
> >
> > I think what you are saying is that there is kind of a race condition here.
> > When something is on the wait queue, it has to be followed through to
> > completion. An interrupt could be received at any time, and if it's taken
> > off of the wait queue prematurely, it'll crash the kernel, because the
> > interrupt has no way of telling that.
>
> The problem is in locking. You must not kill process while it is
> in uninterruptible state because it is uninterruptible
> for a reason - has taken semaphore, or get_cpu(), etc.
> You do want it to do put_cpu(), right?
>
> Processes must never get stuck in D, it's a kernel bug.
>
> Find out how did process ended up in D state forever,
> and fix it - that's what I'm trying to do
> in these cases.

Perhaps it would be useful to add some debugging to the kernel for these
cases, somewhat akin to Ingo's preempt trace stuff?

If a process is in D state and receives a SIGKILL, assume it must exit
within a few seconds or it's a bug, and dump as much information about
it as is practical...?

-J

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/