Re: KMSGDUMP: dump kernel messages to a diskette

Willy Tarreau (willy@novworld.Novecom.Fr)
Thu, 8 Jul 1999 23:13:36 +0200 (CEST)


> > Well, in that case you could use 0xa, which is handled by all i386 and
> > newer BIOSes, AFAIK, and requires less messing. Neither are safe, anyway,
> > because they require RAM contents to be valid which might not be the case.
> > And executing from clobbered RAM mau cause further destruction -- e.g.
> > some unwanted hard disk writes.
>
> md5 the important parts of memory, and then checksum verification before
> using them. quite simple, and 100% guarantees we don't use clobbered ram.

what if the md5-verifying function is clobbered itself ?

> > Writing "to recover" I meant to start the boot loader, which might in
> > turn dump the memory to swap, for example, just as it's done by OSes for
> > non-ia32 platforms.
>
> This is what I was thinking too. Can we load lilo, which in turn dumps
> memory to swap partition?

I think that could be possible, but may be both good and bad :

- it may be good because lilo's code is now far more accurate than my
ASM.
- lilo already has informations about disks, boot files (kernel) ...
- the disk from which lilo boots the kernel is obviously accessible from
the bios
- but : it's not good that lilo completely re-runs a kernel after a crash,
while the bios may not have completely cleant everything. So we'll have
to maintain a special flag to force it to dump, and prevent it from
booting. If that flag gets a problem during the dump or anything, it
would be very annoying that lilo can't boot anymore because it would
always try to dump a crash. Linux mustn't be as crash-sensitive as NT,
IMHO.
- if we use the bios to reboot, we loose the buffer in RAM.

A quick reflexion on this makes me think about a scenario like this one :

- upon crash, the messages are immediately dumped to sector 2 of the
first hard disk known to the bios. There should be at least one for lilo.
This space is nearly never used (except by viruses) and can handle several
sectors of data.
- the system reboots through the bios.
- lilo sees anormal data in these sectors and copies them itself to the
right place, while marking them read.
- it then reboots completely (hard reboot) to ensure better cleaning.
- when it gets back, it sees the sectors marked as "read" so it boots
normally.

Obviously, dangerous points would be :
1) direct dump to the beginning of the disk. If the dump code is corrupted,
noone can tell what could happen
2) it's very important that lilo never enters in a "dump loop" and can boot
normaly, what ever error could have been detected during the last dump
attempt.

Comments ?

>
> -Dan
>

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/