Re: [PATCH v4 00/21] Add Qualcomm Minidump kernel driver related support

From: Mukesh Ojha
Date: Fri Jun 30 2023 - 12:05:03 EST




On 6/29/2023 4:42 AM, Rob Herring wrote:
On Wed, Jun 28, 2023 at 9:45 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:

On Wed, Jun 28, 2023 at 06:04:27PM +0530, Mukesh Ojha wrote:
Minidump is a best effort mechanism to collect useful and predefined data
for first level of debugging on end user devices running on Qualcomm SoCs.
It is built on the premise that System on Chip (SoC) or subsystem part of
SoC crashes, due to a range of hardware and software bugs. Hence, the
ability to collect accurate data is only a best-effort. The data collected
could be invalid or corrupted, data collection itself could fail, and so on.

Qualcomm devices in engineering mode provides a mechanism for generating
full system ramdumps for post mortem debugging. But in some cases it's
however not feasible to capture the entire content of RAM. The minidump
mechanism provides the means for selecting which snippets should be
included in the ramdump.

Minidump kernel driver implementation is divided into two parts for
simplicity, one is minidump core which can also be called minidump
frontend(As API gets exported from this driver for registration with
backend) and the other part is minidump backend i.e, where the underlying
implementation of minidump will be there. There could be different way
how the backend is implemented like Shared memory, Memory mapped IO
or Resource manager(gunyah) based where the guest region information is
passed to hypervisor via hypercalls.

Minidump Client-1 Client-2 Client-5 Client-n
| | | |
| | ... | ... |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| +---+--------------+----+ |
+-----------+ qcom_minidump(core) +--------+
| |
+------+-----+------+---+
| | |
| | |
+---------------+ | +--------------------+
| | |
| | |
| | |
v v v
+-------------------+ +-------------------+ +------------------+
|qcom_minidump_smem | |qcom_minidump_mmio | | qcom_minidump_rm |
| | | | | |
+-------------------+ +-------------------+ +------------------+
Shared memory Memory mapped IO Resource manager
(backend) (backend) (backend)


Here, we will be giving all analogy of backend with SMEM as it is the
only implemented backend at present but general idea remains the same.

If you only have one "backend" then you don't need the extra compexity
here at all, just remove that whole middle layer please and make this
much simpler and smaller and easier to review and possibly accept.

pstore already supports backends. Why aren't the above backends just
pstore backends rather than having an intermediate pstore backend in
RAM which then somehow gets moved into these minidump backends.

It can't be another pstore backend since, pstore backend(ram) is for
the system where there is a guarantees of fixed ram range content persist across boot, but that is not true with minidump, there is no
pstorefs kind of support with minidump.

Instead, the whole idea of backend/front-end of minidump here, is
entirely related to minidump and here minidump backend could be
anything like could be Shared memory in DDR which is shared across
multiple subsystem (CPUSS) is one of them or it could be something
where you want to collect minidump for a guest vm which guest
does not have access to backend instead it may be the hypervisor
do the register the region for the guest.

Pstore(ram) is one of the clients of minidump where, we want to collect
the console/pmsg/ftrace/dmesg logs for production devices where
DDR dump is difficult and the reason of doing this to not re-invent
the wheel and it just need physical address/size .

+---------+ +---------+ +--------+ +---------+
| console | | pmsg | | ftrace | | dmesg |
+---------+ +---------+ +--------+ +---------+
| | | |
| | | |
+------------------------------------------+
|
\ /
+----------------+
(1) |pstore frontends|
+----------------+
|
\ /
+------------------- +
(2) | pstore backend(ram)|
+--------------------+
|
\ /
+--------------------+
(3) |qcom_pstore_minidump|
+--------------------+

-Mukesh

We don't add layers when they are not needed, and never when there is no
actual user. If you need the extra "complexity" later, then add it
later when it is needed as who knows when that will ever be.

Please redo this series based on that, thanks.

My bigger issue with this whole series is what would this all look
like if every SoC vendor upstreamed their own custom dumping
mechanism. That would be a mess. (I have similar opinions on the
$soc-vendor hypervisors.)

Rob