Re: (3c509) eth0: Missed interrupt

Paul Gortmaker (gpg109@rsphy1.anu.edu.au)
Tue, 12 Mar 1996 13:00:52 +1100 (EST)


Petri Kaukasoina, Mon, 11 Mar 1996 10:08:31 +0200 (EET):

>See the www-page of the author of the 3c509 driver (Donald Becker):
>http://cesdis.gsfc.nasa.gov/linux/drivers/3c509.html.
>The fix is to define final_version flag.

No, not exactly true. Look closely at what is on Don's page. It says with
respect to the "Missed interrupt, status then 2011 now 2000" error message:

"... The driver then thinks the interrupt line is broken, and prints the
message. As part of printing the message, it check the interrupt status
again. Note that the "now" value has the interrupt cleared, so it was
handled after all."

But the errors people have been reporting are "then 2011 now 2011"
(i.e. the "now" value is the same as the "then" value) meaning that the
interrupt probably *wasn't* handled after all.

I *suspect* that there is a cli/sti pair somewhere that contains too much
processing inside it, causing an unacceptable interrupt latency. Given
that the 3c509 requires a low interrupt latency, it is the first device
to complain. The proper fix is to find if/where this happens, and not just
#ifdef out the offending message via "#define final_version". Apparently
it wasn't a problem until about 1.3.4x so perhaps it was added then.

An idea I was thinking about to try and debug this is to start some magical
timer in cli(), and stop it in sti() -- via changing these macros to
inline functions containing this "timer code" and the original __asm__ as
well. If more than a "reasonable" amount of time has passed, sti()
would scream loudly and then dump its EIP so that we could tell where
in the souce the interrupts are being left off too long.

Oh well. Sounds good in theory anyways... :-)

Paul.