[PATCH][WIP] Using kexec for crash dumps in LKCD

From: Suparna Bhattacharya (suparna@in.ibm.com)
Date: Thu Feb 06 2003 - 10:56:15 EST


This is an extension to LKCD to make use of Eric
Biederman's kexec implementation to delay the actual
writeout of a crashdump to disk to happen after a
memory preserving reboot of a new kernel.

The real thanks for this goes to Dave Winchell and the
rest of the Mission Critical Linux folks for first
implementing such an approach in mcore using Werner
Alamesberger's bootimg, and letting us learn and borrow
ideas from it.

There is a subtle but crucial difference in the design
of the scheme we use to get spare pages to save the dump
which potentially enables us to save a complete memory
snapshot (not just kernel pages) if we can get a good
compression efficiency (i.e. theoretically limited
only by the degree of compressability of the memory
state and working memory space that must be left for the
dump and kernel bootup code).

This code is still somewhat raw and there's a list of
todo's and improvements in my mind, and loopholes to fix,
but I decided it was high time to put this out for a start,
so anyone who is interested could start taking a look and
playing with it, and maybe help out if they like.

I plan to fold it into lkcd cvs tomorrow if possible unless
anyone notices a major regression of existing lkcd
functionality (i.e. without CONFIG_CRASHDUMP_MEMDEV and
CRASH_DUMP_SOFT_BOOT). I have tried out Alt+Sysrq+d and a
simple panic from a module as a sanity check.

(I haven't tried it out for a true panic yet - going there
bit by bit :))

In any case, I'll tag the cvs tree before checking in.

Merging and testing has been rather time consuming, so
would appreciate if anyone planning to check in any changes
before I do would let me know ahead of time.

I'm considering also checkin in a TODO file at the
top of the 2.5 directory in CVS to keep track of what
needs to be done. Would that be a good idea ?
I'll probably also post the TODOs on the mailing list.

OK, going ahead:

Steps to use:
--------------

A. Patching the kernel:
1) Patch vanilla 2.5.59 kernel with the kexec patches for
   2.5.59.
   I picked the ones from the OSDL site which Andy Pfiffer had
   mentioned in an earlier post
         kexec for 2.5.59 (based upon the version for 2.5.54)
        http://www.osdl.org/cgi-bin/plm?module=patch_info&patch_id=1442

        hwfixes that makes it work for me (same as for 2.5.58):
        http://www.osdl.org/cgi-bin/plm?module=patch_info&patch_id=1444

2) Apply the latest dump patches from lkcd cvs
        i.e. apply the kernel patches under 2.5/patches
    (expect to see one reject in the 2nd hunk for reboot.c
     when applying notify_die.patch - you could ignore it for
     now)
        and copy the dump driver files at the appropriate
        places

3) Apply the attached patch (kexecdump.patch)

B. Kernel Build Configuration settings
   You'll need CRASH_DUMP to be built into the kernel (not
   as a module) to be able to dump across a kexec boot
   CRASH_DUMP_BLOCKDEV, CRASH_DUMP_COMPRESS_GZIP are needed
   as we use them today
   New options you'll need CRASH_DUMP_MEMDEV (memory dump
   driver) and CRASH_DUMP_SOFTBOOT (kexec based dumping)

C. Run-time setup
   A new dump flag for memory-save-and-dump-after-boot
   DUMP_FLAGS_SOFTBOOT has been introduced (0x2), which
   would need to be turned on in the dump flags.

   After running lkcd config as usual, there is one
   extra step needed to load the kernel to be kexec'ed
   This involves executing "kexec -l" with the regular
   command line options (derived from you /proc/cmdline)
   and one extra boot parameter, obtained as follows:
   crashdump=`cat /proc/sys/kernel/dump/addr`
   (This tells the new kernel where to find a saved
   in-memory crash dump from previous boot)

   e.g.
   kexec -l --command-line="root=806 console=tty0 console=
   ttyS0,38400 crashdump=`cat /proc/sys/kernel/dump/addr`"
   <kernel bzImage>

D. On panic, the dump is saved in memory and then kexec is
   used to boot up a new kernel (instead of a regular reboot)
   If Alt+Sysrq+d is pressed then the dump is just saved
   in memory without rebooting

   [Note: The first few times you try it, it might be a
   good idea to drop into "init 1" and unmount most filesystems
   or remount them as read-only , before you force the panic
   - thanks to Andy Pfiffer for the tip ]

E. After running "lkcd config" triggers a writeout
   to the dump disk of the previously saved dump in memory.

F. From here on, one can run "lkcd save" as usual to generate
   the /var/log/dump/* files for analysis.

Regards
Suparna

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Labs, India


- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Feb 07 2003 - 22:00:20 EST