Re: is killing zombies possible w/o a reboot?

From: Alex Bennee
Date: Thu Nov 04 2004 - 12:24:34 EST


On Thu, 2004-11-04 at 10:04, Helge Hafting wrote:
> Russell Miller wrote:
> >On Wednesday 03 November 2004 16:15, Jim Nelson wrote:
> >
> >Anyway, is there a way to simply signal a syscall that it is to be interrupted
> >and forcibly cause the syscall to end?
> >
> There is a way. Processes go into D state happens all the time
> when waiting for disk io or similiar. Then the io happens a few ms later,
> and the fs or device driver tells the kernel to wake up the process
> so it gets a chance at the next scheduling opportunity. So the mechanism to
> unstick a prcess exists, and is used by every device driver that
> use sleeping. Which is most of them.
>
> Breakage happens when something never comes out of D-state.
> One could write a trivial syscall (or addition to "kill") that "wakes"
> processes waiting for io. It itsn't hard to do at all - just copy the
> waking code from any device driver. This will allow to kill and
> fully remove any process that hangs around in D-state. This might
> also release other stuck resources as the syscall
> continues, returns to userspace, and allows the process to die.
>
> Unfortunately, this isn't enough. In some cases the syscall
> expects the io device interrupt handler to have done something
> vital - but this haven't happened when we forcibly wakes a process.
> We can hope for an io error, but might get a crash instead. This
> can be fixes with a lot of work - basically check at every wakeup
> if the process were woken by this new killing mechanism and
> act accordingly. It shouldn't be hard, but _lots_ of work
> inspecting every sleeping point, at least every device driver.

Timeouts and interruptible sleeps are the two ways to solve the problem.
All good drivers should have covering timeouts in case the event they
where hoping for never happens.

If the code path that assumes magic has happened after it wakes up
doesn't check its not defensive enough. Also you can make tasks
interruptible so signals can get through:

result = wait_event_interruptible(dev->waitq,dev_irq_event(dev));

if (result) {
printk(KERN_ALERT "dev_irq_wait: Interrupted by a signal\n");
return -ERESTARTSYS;
};

As you have noted you can't always make things interruptible, but decent
timeouts should always exist. Hardware has bugs too!
--
Alex, Kernel Hacker: http://www.bennee.com/~alex/

In English, every word can be verbed. Would that it were so in our
programming languages.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/