Re: Work around a lockup?

From: Jan Engelhardt
Date: Tue Nov 16 2004 - 15:24:34 EST


>No driver code should ever wait forever. Some module code may
>be broken where the writter assumed that some bit must eventually
>be set or some FIFO must eventually empty, etc. Hardware breaks.

The box has locked up and I would like to know if there's a way around it.

>If you need to wait a long time for something, you can execute
>schedule_timeout(n) in your counted loop. This will give up
>the CPU to other tasks while you are waiting. More sophisticated
>code sleeps until interrupted, etc. Of course, the interrupt
>may never happen so your driver needs to plan for that too.

Let's *do* assume that some module's algorithm is not perfect, and further
assume that ATM, it's in an endless loop. Moreover, editing the module's source
is not an option.

This is not a homework or something, it's real. And I do not know where it's
hanging. Sure, SYSRQ+P would tell me where, but that could get hard to track if
it's the Nth stack frame (seen from the inner-most) for big N.

So for the moment to keep downtimes small, best option would be to have
something to circumvent the blocker process. E.g. putting it to sleep and
(then, finally, when I regain control) poke with the module's/kernel's source.

I've generalized the case into the above-mentioned for(;;); because that's the
worst case for uniprocessors, and I think it's best to start tackling there.


Jan Engelhardt
--
Gesellschaft fÃr Wissenschaftliche Datenverarbeitung
Am Fassberg, 37077 GÃttingen, www.gwdg.de
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/