Linux SMP problems: wait_on_bh

Fredrik Lindgren (fli@unix.edu.sollentuna.se)
Fri, 5 Feb 1999 23:59:39 +0100 (MET)


Hi!

I'm getting a little irritated at this, no make that very irritated...

We have a Compaq Proliant 850R with two PPro 200 CPUs and for as long as I
can remember we have been seeing these "wait_on_bh" lockups where the
machine would freeze up completely requiring a power cycle. This happened
in 2.0.x and is still happening in 2.2.1, less frequently though. I have
had both CPUs exchanged so it would seem very unlikely that it is hardware
problem.

This is the data from the latest crash (it was repeated atleast 5 times):

wait_on_bh, CPU 0:
irq: 0 [0 0]
bh: 1 [0 1]
<[c010a385] [c016881e] [c0162e6b] [c0163108] [c016f91e] [c01533cb]
[c01537c5] [c012611d]>

These are the mappings from System.map:

c010a385: synchronize_bh
c016881e: tcp_v4_unhash
c0162e6b: tcp_close_state
c0163108: tcp_close
c016f91e: inet_release
c01533cb: sock_release
c01537c5: sock_close
c012611d: __fput

I have posted about this problem two times before but I haven't received a
single reply. I guess it's an difficult problem. Quite frankly it is
beyond me to debug this, so I hope that someone else will be able to shed
some light on the matter. At the moment it's feeling a bit hopeless...

I keep reading about how stable Linux is all over the place and then in my
expreience it is worse off than NT, atleast in SMP configurations. To add
to the frustration there is this darn 30 minute FSCK everytime the thing
goes down... A journaling filesystem would sure would be nice.

Nevertheless, thanks alot for the effort you guys put into this, after all
it's free! :)

Regards,
Fredrik Lindgren
SolNET, Sweden

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/