Re: is killing zombies possible w/o a reboot?

From: Helge Hafting
Date: Wed Nov 03 2004 - 15:11:34 EST


On Wed, Nov 03, 2004 at 11:24:19AM -0500, Gene Heskett wrote:
> On Wednesday 03 November 2004 09:33, bert hubert wrote:
> >On Wed, Nov 03, 2004 at 07:51:39AM -0500, Gene Heskett wrote:
> >> But I'd tried to run gnomeradio earlier to listen to the
> >> elections,
> >
> >Depressing enough.
> >
> >> I'd tried to kill the zombie earlier but couldn't.
> >> Isn't there some way to clean up a &^$#^#@)_ zombie?
> >
> >Kill the parent, is the only (portable) way.
>
> The parent would have been the icon. It opened its usual sized small
> window, but never did anything to it. I clicked on closing the
> window, but 10 seconds later the system asked me if I wanted to kill
> it as it wasn't responding. I said yes, the window disappeared, but
> kpm said gomeradio was still present as process 8162, and that wasn't
> killable. Funny thing is, on the reboot, it automaticly self
> restored and ran just fine.
>
> I consider this as one of linux's achilles heels. Such a hung and
> dead process can be properly disposed of by a primitive os called os9
> because it keeps track of all resources in tables in the kernel
> memory space. Issueing a kill procnumber removes the process from
> the exec queue, reclaims all its memory to the system free memory
> pool, and removes it from the IRQ service tables if an entry exists
> there. Near instant, total cleanup, nothing left, in about 250
> microseconds max. 1.79 mhz cpu's aren't quite instant :)
>
Killing a process in linux with "kill -9 oid" also release all resources,
such as memory and file descriptors. The resource consumption of a
"zombie" is measured in bytes, not kilobytes.

> Lets just say that I think having to reboot because of a zombie that
> has resources locked up, and have the reboot fubared by it too,
> aren't exactly friendly actions.
>
Did you try logging out from the graphical user interface,
and then logging in again?
GUI programs are usually children of the window manager (or some
app launcher, all of these quit when you log out. A plain
zombie started from the GUI will disappear after that.

Only something stuck in a device driver will need the reboot,
but that tends to be a bug in the driver.
You can try unloading the driver module, but linux has a
nasty tendency to answer that with an OOPS or worse. When
something goes wrong - it does so properly and thourougly. :-)

> I fully realise that linux has a much more complex method of
> allocating resources, but doesn't it *know* exactly what resources
> have been passed out to each process?
>
Yes it does - the problem is that not all resources are managed
by processes. Some allocations are managed by drivers, so a driver
bug can get the device into a unuseable state _and_ tie up the
process(es) that were using the driver at the moment.

> And why is there no entry from the kill function into that resource
> management portion of the kernel so that this could also be done by
> the linux kernel, say with a "kill --total procnumber"?
>
Interesting, but you might need a path from "kill" into
every device driver. :-/ And of course it wtill won't work
if there is a bug in the driver.

> Seems like a heck of a good question to me since an os written to run
> on a 64k machine in 1981, and expanded to run on a 128K to 2 megabyte
> machine in 1986 can do it just fine. Even if that process is still
> running and spitting out data to its parent window/shell! Or if its
> crashed and scribbled over all its memory, makes no difference to
> os9. You (root) wants it gone, fine, its gone.
>
Can os9 do this if the process is busy calling into a buggy
device driver that simply doesn't return or perhaps believes
that some dma operation into process memory is taking forever?
Or perhaps os9 doesn't have lots and lots of drivers written by
different people with varying competence?

Often, the real solution is to fix the driver to deal with
"unexpected" conditions.

Helge Hafting

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/