Re: memtest86, built into kernel

Ulrich Windl (Ulrich.Windl@rz.uni-regensburg.de)
Thu, 25 Apr 1996 16:45:30 +0200


On 25 Apr 96 at 10:24, Pat Crean wrote:

> > From: "Ulrich Windl" <Ulrich.Windl@rz.uni-regensburg.de>
> > Date: Wed, 24 Apr 1996 09:51:20 +0200
>
> > On 23 Apr 96 at 17:09, Karl Keyte wrote:
> >
> > > > >
> > > > > Given that it happens so rarely, that parity is only 50% likely to
> > > > > catch the error anyway, and that parity requires an extra 12.5% DRAM,
> > > > > it doesn't seem worth it to me. ECC is more useful, since it will
> > > > > correct single-bit errors rather than just hanging.
> > > > >
> > > > > -Matt
> > >
> > > No, surely the parity is virtually 100% certain to catch the error...??
> > > The only way it wouldn't is for more than the one bit to be in error
> > > in such a way that the parity becomes valid again. If bit errors are
> > > so rare, it's an unlikely situation, so the parity bits should be a
> > > good test. However, it's so rare, and parity bits themselves can be
> > > subject to error, I wouldn't bother with it. They don't either!
> >
> > The probability that a reported parity error is due to a error in the
> > parity bit is 1/9. Parity errors are rather rare; thus that type of
> > error is even more unlikely.
> >
> Actually it's considerably worse than that. If the only addition to
> the system was a single bit of memory, with all else being equal,
> then the probability that a reported parity error was actually caused
> by a failing parity bit would be 1/9. Unfortunately, all else is not
> equal; the circuitry to generate and check parity also has some
> finite, non-zero failure rate which increases this probability. As
> well, because parity must be calculated for every memory write, write
> timing is 5-10 nsec tighter for the parity chip than it is for the
> data bits which also contributes to a higher failure rate for this
> bit.

This reminds me of Heisenberg's famou's theory applied to RAM soft
errors: Either you can watch the parity errors, or you can believe
the manufacturer's estimates. But as soon as you try to confirm the
failure rate, it will increase (because you tried, how could you).

I'm not saying that you are wrong, but it seems a bit strange to me.
I won't talk any more about that (promised!).

> In the real world, it doesn't really matter. The cost of adding
> parity checking to the memory systems of pc systems is too high for
> the marginal return. There is no increase in reliability of the
> systems so equipped (in fact, there is a significant decrease). For
> those few applications that can't tolerate an occasional failure,
> there is ecc (whose actual hardware failure rate is, due to extra
> components, double or triple that of unprotected memory sub-systems).
> Of course, you will most likely notice a significant degradation in
> performance as well as generating, storing, retrieving, and checking
> the error codes adds 10-15% to the access time of your memory.
> In short, if you're concerned about memory failures, by all means,
> run background memory tests, but don't think that adding a parity bit
> is going buy you anything to speak of.
>
> Just my $.02 worth..... (no, I didn't get up on the wrongt side of the
> bed this morning, but I did step on the coon hound sleeping on the
> right side.......)
>
> Pat
>