Re: Possible dcache BUG

From: Gene Heskett
Date: Sun Aug 22 2004 - 00:06:37 EST


On Friday 20 August 2004 16:17, Gene Heskett wrote:
>On Friday 20 August 2004 16:11, R. J. Wysocki wrote:[...]
>
>>There's a simple test you can do unless your DIMMs must go in pairs
>> (I don't remember if it's required by nforce2): remove one of them
>> and see what happens.
>
>To get dual channel DDR, they have to be in a pair. Since this
> post, they've been swapped one for the other, and I'll be curious
> to see if the address goes to an even address when it errors, which
> it hasn't yet.
>
It has, one time in 35 hours now. The problem is considerably
reduced.

Whereas the error was always at an odd address, and in the 2nd LSbyte,
now its still an odd address but the error has moved to the MSB of a
32 bit fetch:

[root@coyote memburn]# ./memburn 512
Starting test with size 512 megs..
Passed round 2308, elapsed 41225.98.
FAILED at round 2309/40220063: got ff000000, expected 00000000!!!
REREAD: ff000000, ff000000, ff000000!!!
[root@coyote memburn]# ./memburn 512
Starting test with size 512 megs..
Passed round 2636, elapsed 60944.15.

As can be seen, I restarted it, and its ran quite even more loops now
without error. There has been no more Oops, but with memburn eating
512 megs, half my ram, and kde-3.3 under construction by konstruct,
I've peaked at nearly a gig of swap, and 754 megs in swap right now.
Sure, its a bit laggy, but not unusable.

So now the question is since the error address is always odd, which
stick is it?

Or do I need to sanitize the dimm sockets somehow?

They sure seem to slip in and out easy enough for a socket with that
many contacts. Not over 3 pounds on each end will seat them, and if
the clips are re-opened they virtually fall out into your hand. I'm
rather more used to having to press 5 to 10 pounds on each end to
seat them.

Next time I have to reboot, I'm going to 'exercise' them in and out a
few times just to polish the oxide from the contacts.

>> If you can reproduce the same symptoms on
>> each of them separately, I'd bet on a cache problem.
>
>That makes sense, so I can try that too. I hadn't thought of that,
>duh!
>
>>Greetings,
>
>Someone else asked if ECC was on, but this board doesn't have it,
> and the memory has a blank pattern where the parity chip would be.
> So I think its safe to say no :)

--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/