[RFC PATCH 0/4 v1] LPC materials: livedump

From: Lukas Hruska
Date: Fri Nov 10 2023 - 12:53:44 EST


Quick note
----------

This patchset is primarily here as materials for presentation at the
Linux Plumber Conference. I will appreciate any feedback you can
provide, whether in person at the conference or here. This patch is a
continuation in the development of a long-unupdated patch by YOSHIDA
Masanori. The last version was v3, see [1]


Summary
-------

Linux Kernel currently has a mechanism to create a dump of a whole memory for
further debugging of an observed issue with the help of crashkernel.
Unfortunately, we are unable to do this without restarting the host which causes
a problem in case of having a high availability service running on the system
experiencing some complex issue that cannot be debugged without the complete
memory dump and hypervisor-assisted dumps are not an option on bare metal
setups. For this purpose, there is a live dump mechanism being developed which
was initially introduced by Yoshida Maasanori [1] in 2012. This PoC was already
able to create a consistent image of memory with the support of dumping the data
into a reserved raw block device.


Mechanism overview
------------------

Live Dump is based on Copy-on-write technique. Basically processing is
performed in the following order.
(1) Suspends processing of all CPUs.
(2) Makes pages (which you want to dump) read-only.
(3) Dumps hard-to-handle pages (that cannot fault)
(4) Resumes all CPUs
(5) On page fault, dumps a faulting page.
(6) Finally, dumps the rest of pages that are not updated.

Page fault handler sends a dump request to the queue handled by
"livedump" kthread which is in charge of dumping to disk. If ever the
queue becomes full, livedump simply fails, since livedump's page fault
can never sleep to wait for space.


TODO
----
- Large page support
Currently livedump can dump only 4K pages, and so it splits all
pages in kernel space in advance. This may cause big TLB overhead.
- Other target storage support
Currently livedump can dump only to block device. Practically,
dumping to normal file is necessary.
- Other space/area support
Currently livedump write-protect only kernel's straight mapping
area. Pages in vmap area cannot be dumped consistently.
- Other CPU architecture support
Currently livedump supports only x86-64.
- Testing
Testbench and measurements to provide guarantees about
(non)intrusiveness of livedump mechanism under certain conditions.


Summary of changes since 2012 version
-------------------------------------
- rebase for v6.2
- fs/vmcore code modification to be reused by livedump
- memdump output change to ELF format
- crash tool modification not needed anymore
- all loops through pfn's replaced with pagewalk
- 5-level paging support
- multiple bitmaps handling page-faults for correct restoration of PTE's state
- rewrite API from ioctls to sysfs


[1] https://lore.kernel.org/r/20121011055356.6719.46214.stgit@xxxxxxxxxxxxxxxxxxxxxxx/

YOSHIDA Masanori (1):
livedump: Add memory dumping functionality

Lukas Hruska (3):
crash/vmcore: VMCOREINFO creation from non-kdump kernel
livedump: Add write protection management
livedump: Add tools to make livedump creation easier

arch/x86/Kconfig | 29 ++
arch/x86/include/asm/wrprotect.h | 39 ++
arch/x86/mm/Makefile | 2 +
arch/x86/mm/fault.c | 8 +
arch/x86/mm/wrprotect.c | 744 +++++++++++++++++++++++++++++
fs/proc/vmcore.c | 57 +--
include/linux/crash_dump.h | 2 +
kernel/Makefile | 1 +
kernel/crash_core.c | 10 +-
kernel/crash_dump.c | 38 ++
kernel/livedump/Makefile | 2 +
kernel/livedump/core.c | 262 ++++++++++
kernel/livedump/memdump.c | 525 ++++++++++++++++++++
kernel/livedump/memdump.h | 32 ++
kernel/livedump/memdump_trace.h | 30 ++
tools/livedump/livedump.sh | 44 ++
tools/livedump/livedump_extract.sh | 19 +
17 files changed, 1803 insertions(+), 41 deletions(-)
create mode 100644 arch/x86/include/asm/wrprotect.h
create mode 100644 arch/x86/mm/wrprotect.c
create mode 100644 kernel/livedump/Makefile
create mode 100644 kernel/livedump/core.c
create mode 100644 kernel/livedump/memdump.c
create mode 100644 kernel/livedump/memdump.h
create mode 100644 kernel/livedump/memdump_trace.h
create mode 100755 tools/livedump/livedump.sh
create mode 100755 tools/livedump/livedump_extract.sh