>> - (on 34pre2:) CP: U #3Attempted flush tlb IPI when not AKP (=255)
>> (followed by:)
>> - IRQ DEADLOCK DETECTED BY CPU [012] (and locked again...)
>Those two are almost always caused by loading a non SMP module into an SMP
>kernel. I say almost because its not always.

This seems to be one of those cases. Although I was pretty sure that I had
no modules other than those installed after the kernel compile, I did a rm
-rf /lib/modules/* and compiled the kernel (34pre2) with module support
turned off, only the bare necessities turned on (see below) and with the
latest tulip driver from cesdis.

The machine then hung with an Aiee scheduling in interrupt (in sys_idle,
according to when the RH bootscripts loaded kerneld (version
2.1.85). So I removed all modutils and tried again. This time I could
login, but some simple compiles over NFS caused Aiee:'s again (each time in
a different location, once in read_chain and once in

I decided to try one more time (hadn't seen enough fscks yet ;-)):

Aiee: scheduling in interrupt (at the beginning of ret_from_sys_call)
gfp called nonatomically from interrupt 00000003
gfp called nonatomically from interrupt 00000003
gfp called nonatomically from interrupt 00000003
(and, after about 30 secs:)
(even after 15 mins there were no more of these)

With a similar config on 2.1.89pre5, everything is rock solid. Multiple
tcpsprays, kernel make -j's and NFS compiles and not even an oops. Too bad
about the jerky TCP, though.

(Alan, would you know where I should look to fix this TCP jerkiness ?
Symptoms: on a switched 100Mbps net, kernels from as far back as 2.1.55
wait for ACKs from NT and IRIX boxen, although there is still room in the
window. Only about 300 kbps is used; tcpdump shows no lost datagrams. The
problem doesn't show when communicating to another Linux box, and these
200ms pauses play havoc with our real-time compressed video. Traces
available upon request; this is 100% reproducible).


Jan-Derk Bakker.


