wait_on_bh trouble on SMP box with 2.2.11

Ward Vandewege (ward.vandewege@pandora.be)
Sun, 29 Aug 1999 15:16:10 +0200


Hi,

I am having trouble with a dual PII-450Mhz box, with 4 SCSI disks
(totalling 22 GB). It has 31.000 e-mail and web accounts (!), and currently
runs vanilla 2.2.11. The machine is quite heavily loaded, with load
averages from 2 to 4, depending on the time of the day. Every now and then,
on average 2 to 3 times per day, it just locks up with the following on the
console:

wait_on_bh; CPU 0:
irq: 0 [0 0]
bh: 1 [0 1]
<[c010b24d]><[c014d9a6]><[c014d942]><[c012799f]><[c0128bbf]><[c0127a1a]>
<[c0127a84]><[c010915c]>

The machine is locked solid. Doesn't respond to ping anymore.
I've looked up these addresses in the System.map file:

<[c010b210]>: synchronize_bh
<[c014d8f8]>: sock_poll
<[c014d91c]>: sock_close
<[c0127980]>: __fput
<[c0128ba8]>: fput
<[c01279c8]>: filp_close
<[c0127a24]>: sys_close
<[c0109128]>: sys_call

I have seen reports of this problem earlier this year on this list, with
early 2.2.x kernels, but couldn't find a solution. Has anyone experienced
this with 2.2.11?

The box used to run 2.0.36, and was stable - uptimes of months. But the
load became too big, so we had to upgrade to a 2.2 kernel for better SMP
support. The hardware has not been changed.

I would be most grateful for any comments,

Ward Vandewege.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/