Re: 1.3.63 clears my "Aiee: scheduling in interrupt" problem

Brian Dowling (bdowling@tanelorn.ccs.neu.edu)
Thu, 15 Feb 96 09:16:23 -0500


> I just brought up 1.3.63 and my Aiee problem can no longer be triggered.
> (That was a tcp connection to localhost with diald/slip/ppp.) Will see
> how it does otherwise.
> --
> Pete Clements
> clem@clem.digital.net

For the record, I also no longer have this Aiee problem with 1.3.63. I've had
my box up with this version for only about 8 hours, but no problems so far.

My symptoms, with 1.3.6[21] (I didn't succeed in compiling 1.3.58-.60), was
the same 'Aiee: Scheduling in interrupt' problem reported by many others. I
found that without diald running (0.11 and I also tried the 0.12 beta), I
could consistently provoke the crash. If I killed diald, I could not cause it
to happen. While diald was running, it had no problems bringing my net
up/down/etc, just no tcp connections to localhost. I could trigger the bug
with my ppp link both up and down, with diald running.

Does anyone know what the cause of this was? I was trying to probe into it at
the time I noticed 1.3.63 was available, and I had started to make some
debugging kernel mods to 1.3.63 before I compiled it -- then once I had it up,
I realized it wasn't a problem anymore. :)

Curiously, one of the things I was trying to do was strace on diald, the
crash, however, happened so fast and spewed so many errors that I couldn't
trap any useful information. Now that 1.3.63 is stable, I can do this. For
some unknown reason, when I telnet 'localhost', diald is processing all kinds
of 'stuff' when my connections are active. I can see it reading it all on a
socket. I haven't had time to looked into this further yet, but that doesn't
seem right, at least not for localhost connections. I have a localhost route
to 'lo' (actually it's a -net 127.0.0.0, but either does the same). Perhaps
this is just some /proc file, but a quick glance over diald.sources doesn't
reveal what it could be.

FWIW, I also notice the 'telnet localhost Unused_port' == timeout instead of
Connection refused. Is this related?

...brian