Re: NMI errors in 2.0.30??

Rogier Wolff (R.E.Wolff@bitwizard.nl)
Mon, 28 Apr 1997 17:49:40 +0200 (MET DST)


Gabriel Paubert wrote:
> There are however a few tricks to get such a program successfully
> running. The only solution I see for now is to copy all the code and data
> to video memory and execute from there. This type of program must be
> somehow like memtest86: stand-alone booting from floppy (no OS would
> survive since it will end corrupting memory for sure).

I see lots of people writing memorytest that go through lots of
trouble to test ALL of the memory, even the memory that the memtest
program is in. In my eyes this is not really necessary.

The memory test program, with the kernel, will occupy max 1/8 of main
memory (1Mb kernel on 8Mb machine). What I've seen so far is that
BIOS'es will detect the "simple" errors like "bit xxyyzz stuck at 0".
The more complicated errors like "when DMA is going on, a write of
0xfffffff to a memory location with 0x00000 as the lower 20 bits of
address will fail by writing a few bits as zeroes" are not that
specific that testing ALL memory is going to make a difference.

If your "fault model" assumes "stuck at x" errors at "random"
locations throughout memory, you're going to miss out on 12.5% of the
errors when you leave a large kernel in place. Those are the errors
that don't cause crashes. The BIOS will "flunk" that memory.

My experience is that my current memory tester catches around 10% of
the bad memory. It catches ALL
- stuck at faults
- single coupling errors
- address decoder errors.

This discussion has provided a few new error causes. Once you know a
possible cause, it is not too hard to design a test for it.....

Roger.