"kernel@netravi.net" wrote:
> On 2.2.17 I had good luck with BP6.
Never tried the 2.2 series with this controller - probably should get
around to doing that this weekend. :S
> EX:
> [root@animal /root]# uptime
> 3:17am up 51 days, 20:17, 2 users, load average: 0.00, 0.04, 0.02
Uptime proves nothing alas; I could get 30+ days if I wanted :S
> Can you post/e-mail any additional details about the lockups. I am very
> curious about this. We have a huge mosix cluster in production with BP6
> mobo's and have plans to upgrade to 2.4 as soon as an official stable
> kernel is released.
The problem is not one of lockups. The system in question doesn't lock;
it doesn't crash; it doesn't even log anything at the time - not even
APIC errors.
Instead it quietly and silently corrupts exactly four bytes at a time;
mostly on the last four bytes of a 4096 block...
I can most easily cause the corruption by copying a large amount of
known data across the disk, and then checking for differences:
cp -aR /usr/src/linux /usr/src/l2 ; diff -dur /usr/src/linux /usr/src/l2
When it does corrupt in this manner, it is not so much of a concern - I
can detect, delete, and recopy the corrupted file(s).
What is more worrying is, what if it is quietly silently and happily
corrupting other data - when I'm NOT staring at it with a paranoid
concern; for example, compiling binaries; or altering large data
files...
As such, I cannot at this time reduce the problem to one of Software or
Hardware Error; and while it is A Major Problem in my eyes; I avoid it
currently by not using the hpt366 controller :)
Gerard Sharp
Two Penguins at 1024x768
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Thu Dec 07 2000 - 21:00:17 EST