On Thu, Dec 29, 2005 at 01:33:47AM -0800, Chris Stromsoe wrote:
The machine is a dual P4 Xeon with hyperthreading on. It can probably get by with only one cpu enabled. If/when it goes down again, I'll boot with nosmp. For what it's worth, I ran a Dell memory tester ("MP Memory") which claims to test all of the CPUs for a few hours and didn't come up with anything. The machine feeds usenet and is seeing a lot more io than cpu. (There are two Adaptec controllers, 4 channels, aic79xx, 5 drives on one channel, 3 unused, spool is on a 4 disk raid5, jfs formatted.)
OK, I've found two old similar reports from people running news servers :
http://www.ussg.iu.edu/hypermail/linux/kernel/0308.1/0807.html
http://seclists.org/lists/linux-kernel/2004/Jan/5699.html
both were using an SMP server with an AIC7xxx adapter, and kernels varying from 2.4.18 to 2.4.24. One of them used XFS and not JFS, so we will exclude any potential JFS-related cause for now.
If you feel brave, you can try to switch the AIC7xxx driver to Justin Gibbs' more recent version, but which has not evolved during last year, but which I have running reliably on production servers :
http://people.freebsd.org/~gibbs/linux/
I also have it rediffed for recent kernels if you prefer :
http://w.ods.org/kernel/2.4-wt/2.4.32-wt2/patches-2.4.32-wt2/pool/aic79xx-20040522-linux-2.4.30-pre3.rediff
Out of curiosity, it would be interesting to disable swap if you have it enabled.