Re: [PATCH] NMI trap revised (was Re: NMI errors in 2.0.30??)

Gabriel Paubert (paubert@iram.es)
Thu, 8 May 1997 20:36:57 +0200 (METDST)


> On May 05, 1997 at 10:10:37PM, Riccardo Facchetti wrote:
>
> > + /*
> > + * May be sort out what memory chip is failing ?
> > + * Heh ... with parity memory we can be a good memory
> > + * test program too :)
> > + * It should be something like:
> > + *
> > + * (1) disable NMI interrupts writing 1 in bit 7 of
> > + * port 0x70
> > + * (2) reset the NMI memory parity error flag (bit 7)
> > + * toggling bit 2 of 0x61 port to 1 and then to 0
> > + * (3) while all flat memory is tested:
> > + * (4) write 4Kb page in memory

Certainly not, you will not catch a memory _parity_ error by overwriting
current memory contents, especially if it has occurred due to a very rare
event in your machine, (ECC may be different), but by reading it again.
This actually makes the algorithm non destructive! Then you can try to
put this page on the bad memory list and to recover. Don't forget however
that the bad data may have been copied into an L2 and/or L1 cache line.
Perhaps test the whole memory twice just in case...

> > + * (5) test if any NMI is pending: if yes, the
> > + * last page written is bogus, printk its
> > + * address.
> > + * (6) ++ page
> > + * (7) panic() out: we have no more things other that
> > + * raw kernel, running on this machine now.
> > + *
> > + * In (4) we should care not to overwrite the kernel
> > + * because I suspect we need it at least for printk()
> > + * and panic()
> > + */

Gabriel