Re: [PATCH v4 04/21] soc: qcom: Add Qualcomm APSS minidump (frontend) feature support

From: Andy Shevchenko
Date: Wed Jun 28 2023 - 09:44:25 EST


On Wed, Jun 28, 2023 at 3:35 PM Mukesh Ojha <quic_mojha@xxxxxxxxxxx> wrote:
>
> Minidump is a best effort mechanism to collect useful and predefined
> data for first level of debugging on end user devices running on
> Qualcomm SoCs. It is built on the premise that System on Chip (SoC)
> or subsystem part of SoC crashes, due to a range of hardware and
> software bugs. Hence, the ability to collect accurate data is only
> a best-effort. The data collected could be invalid or corrupted,
> data collection itself could fail, and so on.
>
> Qualcomm devices in engineering mode provides a mechanism for
> generating full system ramdumps for post mortem debugging. But in some
> cases it's however not feasible to capture the entire content of RAM.
> The minidump mechanism provides the means for selecting region should
> be included in the ramdump. The solution supports extracting the
> ramdump/minidump produced either over USB or stored to an attached
> storage device.
>
> Minidump kernel driver implementation is divided into two parts for
> simplicity, one is minidump core which can also be called minidump
> frontend(As API gets exported from this driver for registration with
> backend) and the other part is minidump backend i.e, where the underlying
> implementation of minidump will be there. There could be different way
> how the backend is implemented like Shared memory, Memory mapped IO
> or Resource manager based where the guest region information is passed
> to hypervisor via hypercalls.
>
> Minidump Client-1 Client-2 Client-5 Client-n
> | | | |
> | | ... | ... |
> | | | |
> | | | |
> | | | |
> | | | |
> | | | |
> | | | |
> | +---+--------------+----+ |
> +-----------+ qcom_minidump(core) +--------+
> | |
> +------+-----+------+---+
> | | |
> | | |
> +---------------+ | +--------------------+
> | | |
> | | |
> | | |
> v v v
> +-------------------+ +-------------------+ +------------------+
> |qcom_minidump_smem | |qcom_minidump_mmio | | qcom_minidump_rm |
> | | | | | |
> +-------------------+ +-------------------+ +------------------+
> Shared memory Memory mapped IO Resource manager
> (backend) (backend) (backend)
>
> Here, we will be giving all analogy of backend with SMEM as it is the
> only implemented backend at present but general idea remains the same.

the general

>
> The core of minidump feature is part of Qualcomm's boot firmware code.
> It initializes shared memory (SMEM), which is a part of DDR and
> allocates a small section of it to minidump table i.e also called

the minidump

> global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) has
> their own table of segments to be included in the minidump, all
> references from a descriptor in SMEM (G-ToC). Each segment/region has
> some details like name, physical address and it's size etc. and it
> could be anywhere scattered in the DDR.
>
> qcom_minidump(core or frontend) driver adds the capability to add APSS
> region to be dumped as part of ram dump collection. It provides
> appropriate symbol register/unregister client regions.
>
> To simplify post mortem debugging, it creates and maintain an ELF
> header as first region that gets updated upon registration
> of a new region.

...

> +#include <linux/device.h>
> +#include <linux/export.h>
> +#include <linux/kallsyms.h>

> +#include <linux/kernel.h>

Why?

And again a lot of missing headers, like

bug.h
dev_printk.h
errno.h
export.h
mutex.h
slab.h

> +#include <linux/module.h>
> +#include <linux/printk.h>
> +#include <linux/string.h>

...

> +/*
> + * In some of the Old Qualcomm devices, boot firmware statically allocates 300
> + * as total number of supported region (including all co-processors) in

regions

> + * minidump table out of which linux was using 201. In future, this limitation
> + * from boot firmware might get removed by allocating the region dynamically.
> + * So, keep it compatible with older devices, we can keep the current limit for

So, to keep...

> + * Linux to 201.
> + */

...

> +static struct elf_shdr *elf_shdr_entry_addr(struct elfhdr *ehdr, int idx)
> +{
> + struct elf_shdr *eshdr = (struct elf_shdr *)((size_t)ehdr + ehdr->e_shoff);

Interesting casting pointer to a size_t. Perhaps void * would work
more explicitly?
Ditto for all other cases like this.

> + return &eshdr[idx];
> +}

...

> + old_idx += strscpy((strtab + old_idx), name, MAX_REGION_NAME_LENGTH);

(Parentheses are not needed)

strscpy() might return a very big number in this case. Is it a problem?

...

> +unlock:

out_unlock: ?

Ditto for other similar cases.

> + mutex_unlock(&md_lock);
> + return ret;

...

> + /*
> + * Above are some prdefined sections/program header used

predefined

> + * for debug, update their count here.
> + */

...

> +#ifndef _QCOM_MINIDUMP_INTERNAL_H_
> +#define _QCOM_MINIDUMP_INTERNAL_H_

> +#include <linux/elf.h>

Not sure I see how it's used. You may provide forward declarations for
the pointers.

> +#include <soc/qcom/qcom_minidump.h>

+ kconfig.h for IS_ENABLED() ?

MIssing forward declaration:
struct device;

...

> #ifndef _QCOM_MINIDUMP_H_
> #define _QCOM_MINIDUMP_H_

+ types.h for phys_addr_t.

...

> + * @size: Number of byte to dump from @address location,

bytes

> + * and it should be 4 byte aligned.

--
With Best Regards,
Andy Shevchenko