RFC design of device coredump collection on panic in Pstore

From: Mukesh Ojha
Date: Mon May 08 2023 - 11:51:48 EST


Hi All,

We are exploring a way where device driver(registered/interested and optional could be
callback) can dump their data(consist of list of buffers or hardware registers) for post
mortam debugging of a crash via dumping the content to pstore region. Some of the example
data could be like clock dump/regulators/ etc.

These stuff should be already part of entire RAM dump but in some cases it's however not
feasible to capture the entire content of RAM, so was thinking if pstore region can be
used to capture above information either in the form of elf or binary and how pstore
can get this in human readable format can be discussed further.

Also, existing devcoredump driver does not cover panic usecases so we thought of something
like below RFC design where,

1. Device_coredump allocates some configurable contigous memory that can be controlled
via CONFIG or bootargs and later registers for panic notifiers.
2. Notifier gets added.
3. Pstore adds device_coredump as its front-end via dumper registration similar
to kmsg being dump today.
4. Successful registration of dumper.
5. A device driver(A-Z) can register their buffer to be dumped as part of panic.
6. buffer gets added to the dump list.
7. Panic occurs.
8. iterate over registered drivers and copy their dump list to its own memory and if
it crosses device core dump memory log an error stop iterating.
9. Similar to devcore_dump() inline with kmsg_dump()
10.Copy the content to pstore region and this could be elf or raw binary that can be
discussed.


Device coredump memory(1) size could be passed from pstore and should be same as size
of devcoredump frontend size given in DT or some other way.

Let me know your concern and view on this.


pstore device_coredump deviceA..Z panic
┼ │
│ │ panic_notifier(1) │
│ ├───────────────────┬─────────────────►│
│ │ notifier added │ (2) │
│ │◄──────────────────┼──────────────────┤
pstore can give │ (3) │ │ │
its region for │ dumper_registration │ │ │
dump ├─────────────────────►│ │ │
│ (4) │ │ │
│◄─────────────────────┤ │ │
│ register_dumper │ │ │
│ │ │ │
│ │ │ │
│ │ │ init │
│ │ ───┼── │
│ │ (5) │ │
│ │devcoredump_register(dev,buf)); │
│ │◄──────────────────┐ │
│ │ (6) │ │
│ ├──────────────────►│ │
│ │device/buf gets add│d │
│ │ to the list │ │panic
│ │ │ ──┼───
│ │ │ │
│ │(7) panic_notifier │ call │
│ │◄──────────────────┼──────────────────┤
│ │ prepare coredump │for the deviceA-Z │
│ ├──────────────────►├────┐ │
│ │ (8) │ │ │
│ │ │ │ │
│ │ │ │ │
│ │ │◄───┘ │
dumper will │ │ (9) │dev_coredump()
write all the device dump to │◄──────────────────┼──────────────────┤
pstore ┌◄───┬─────────────────┤ │ │
region │ │ (10) │ │ │
│ │ │ │ │
│ │ │ │ │
│◄───┘ │ │ │
│ │ │ │
│ │



Regards,
Mukesh

--
2.7.4