RE: [EXT] Re: [PATCH 5/7] coresight: tmc: Add support for reading tracedata from previous boot

From: Linu Cherian
Date: Wed Nov 08 2023 - 20:09:09 EST


Hi James,

> -----Original Message-----
> From: Linu Cherian <lcherian@xxxxxxxxxxx>
> Sent: Tuesday, October 10, 2023 6:53 PM
> To: James Clark <james.clark@xxxxxxx>; suzuki.poulose@xxxxxxx;
> mike.leach@xxxxxxxxxx; leo.yan@xxxxxxxxxx
> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; coresight@xxxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; robh+dt@xxxxxxxxxx;
> krzysztof.kozlowski+dt@xxxxxxxxxx; conor+dt@xxxxxxxxxx;
> devicetree@xxxxxxxxxxxxxxx; Sunil Kovvuri Goutham
> <sgoutham@xxxxxxxxxxx>; George Cherian <gcherian@xxxxxxxxxxx>; Anil
> Kumar Reddy H <areddy3@xxxxxxxxxxx>
> Subject: RE: [EXT] Re: [PATCH 5/7] coresight: tmc: Add support for reading
> tracedata from previous boot
>
> Hi James,
>
> > -----Original Message-----
> > From: James Clark <james.clark@xxxxxxx>
> > Sent: Wednesday, October 4, 2023 7:18 PM
> > To: Linu Cherian <lcherian@xxxxxxxxxxx>; suzuki.poulose@xxxxxxx;
> > mike.leach@xxxxxxxxxx; leo.yan@xxxxxxxxxx
> > Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; coresight@xxxxxxxxxxxxxxxx;
> > linux- kernel@xxxxxxxxxxxxxxx; robh+dt@xxxxxxxxxx;
> > krzysztof.kozlowski+dt@xxxxxxxxxx; conor+dt@xxxxxxxxxx;
> > devicetree@xxxxxxxxxxxxxxx; Sunil Kovvuri Goutham
> > <sgoutham@xxxxxxxxxxx>; George Cherian <gcherian@xxxxxxxxxxx>; Anil
> > Kumar Reddy H <areddy3@xxxxxxxxxxx>; Tanmay Jagdale
> > <tanmay@xxxxxxxxxxx>
> > Subject: [EXT] Re: [PATCH 5/7] coresight: tmc: Add support for reading
> > tracedata from previous boot
> >
> > External Email
> >
> > ----------------------------------------------------------------------
> >
> >
> > On 03/10/2023 17:43, James Clark wrote:
> > >
> > >
> > > On 29/09/2023 14:37, Linu Cherian wrote:
> > >> * Introduce a new mode CS_MODE_READ_PREVBOOT for reading
> > tracedata
> > >> captured in previous boot.
> > >>
> > >> * Add special handlers for preparing ETR/ETF for this special mode
> > >>
> > >> * User can read the trace data as below
> > >>
> > >> For example, for reading trace data from tmc_etf sink
> > >>
> > >> 1. cd /sys/bus/coresight/devices/tmc_etfXX/
> > >>
> > >> 2. Change mode to READ_PREVBOOT
> > >>
> > >> #echo 1 > read_prevboot
> > >>
> > >> 3. Dump trace buffer data to a file,
> > >>
> > >> #dd if=/dev/tmc_etrXX of=~/cstrace.bin
> > >>
> > >> 4. Reset back to normal mode
> > >>
> > >> #echo 0 > read_prevboot
> > >>
> > >> Signed-off-by: Anil Kumar Reddy <areddy3@xxxxxxxxxxx>
> > >> Signed-off-by: Tanmay Jagdale <tanmay@xxxxxxxxxxx>
> > >> Signed-off-by: Linu Cherian <lcherian@xxxxxxxxxxx>
> > >> ---
> > >> .../coresight/coresight-etm4x-core.c | 1 +
> > >> .../hwtracing/coresight/coresight-tmc-core.c | 81 +++++++++-
> > >> .../hwtracing/coresight/coresight-tmc-etf.c | 62 ++++++++
> > >> .../hwtracing/coresight/coresight-tmc-etr.c | 145
> +++++++++++++++++-
> > >> drivers/hwtracing/coresight/coresight-tmc.h | 6 +
> > >> include/linux/coresight.h | 13 ++
> > >> 6 files changed, 306 insertions(+), 2 deletions(-)
> > >>
> > >> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c
> > >> b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> > >> index 77b0271ce6eb..513baf681280 100644
> > >> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
> > >> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> > >> @@ -1010,6 +1010,7 @@ static void etm4_disable(struct
> > >> coresight_device *csdev,
> > >>
> > >> switch (mode) {
> > >> case CS_MODE_DISABLED:
> > >> + case CS_MODE_READ_PREVBOOT:
> > >> break;
> > >> case CS_MODE_SYSFS:
> > >> etm4_disable_sysfs(csdev);
> > >> diff --git a/drivers/hwtracing/coresight/coresight-tmc-core.c
> > >> b/drivers/hwtracing/coresight/coresight-tmc-core.c
> > >> index 6658ce76777b..65c15c9f821b 100644
> > >> --- a/drivers/hwtracing/coresight/coresight-tmc-core.c
> > >> +++ b/drivers/hwtracing/coresight/coresight-tmc-core.c
> > >> @@ -103,6 +103,45 @@ u32 tmc_get_memwidth_mask(struct
> > tmc_drvdata *drvdata)
> > >> return mask;
> > >> }
> > >>
> > >> +int tmc_read_prepare_prevboot(struct tmc_drvdata *drvdata) {
> > >> + int ret = 0;
> > >> + struct tmc_register_snapshot *reg_ptr;
> > >> + struct coresight_device *csdev = drvdata->csdev;
> > >> +
> > >> + if (!drvdata->metadata.vaddr) {
> > >> + ret = -ENOMEM;
> > >> + goto out;
> > >> + }
> > >> +
> > >> + reg_ptr = drvdata->metadata.vaddr;
> > >> + if (!reg_ptr->valid) {
> > >> + dev_err(&drvdata->csdev->dev,
> > >> + "Invalid metadata captured from previous boot\n");
> > >> + ret = -EINVAL;
> > >> + goto out;
> > >> + }
> > >
> > > I'm wondering if a more robust check is needed than the valid flag,
> > > like a checksum or something. I didn't debug it yet but I ended up
> > > with an invalid set of metadata after a panic reboot, see below. I'm
> > > not sure if it's just a logic bug or something got lost during the
> > > reboot, I didn't debug it yet. But I suppose unless you assume the
> > > panic didn't affect writing the metadata, then it could be partially
> > > written and shouldn't be trusted?
> > >
> > > [...]
> > >> +
> > >> +static int tmc_etr_sync_prevboot_buf(struct tmc_drvdata *drvdata) {
> > >> + u32 status;
> > >> + u64 rrp, rwp, dba;
> > >> + struct tmc_register_snapshot *reg_ptr;
> > >> + struct etr_buf *etr_buf = drvdata->prevboot_buf;
> > >> +
> > >> + reg_ptr = drvdata->metadata.vaddr;
> > >> +
> > >> + rrp = reg_ptr->rrp;
> > >> + rwp = reg_ptr->rwp;
> > >> + dba = reg_ptr->dba;
> > >> + status = reg_ptr->sts;
> > >> +
> > >> + etr_buf->full = !!(status & TMC_STS_FULL);
> > >> +
> > >> + /* Sync the buffer pointers */
> > >> + etr_buf->offset = rrp - dba;
> > >> + if (etr_buf->full)
> > >> + etr_buf->len = etr_buf->size;
> > >> + else
> > >> + etr_buf->len = rwp - rrp;
> > >> +
> > >> + /* Sanity checks for validating metadata */
> > >> + if ((etr_buf->offset > etr_buf->size) ||
> > >> + (etr_buf->len > etr_buf->size))
> > >> + return -EINVAL;
> > >
> > > The values I got here are 0x781b67182aa346f9 0x8000000 0x8000000 for
> > > offset, size and len respectively. This fails the first check. It
> > > would also be nice to have a dev_dbg here as well, it's basically
> > > the same as the valid check above which does have one.
> > >
> >
> > So I debugged it and the issue is that after the panic I was doing a
> > cold boot rather than a warm boot and the memory was being randomised.
> >
> > The reason that 0x8000000 seemed to be initialised is because they are
> > based on the reserved region size, rather than anything from the
> > metadata. When I examined the metadata it was all randomised.
> >
> > That leads me to think that the single bit for 'valid' is insufficient.
> > There is a simple hashing function in include/linux/stringhash.h that
> > we could use on the whole metadata struct, but that specifically says:
> >
> > * These hash functions are NOT GUARANTEED STABLE between kernel
> > * versions, architectures, or even repeated boots of the same kernel.
> > * (E.g. they may depend on boot-time hardware detection or be
> > * deliberately randomized.)
> >
> > Although I'm not sure how true the repeated boots of the same kernel
> > part is.
> >
> > Maybe something in include/crypto/hash.h could be used instead, or
> > make our own simple hash.
>
> Thanks for the pointers. Will take a look at it.

Since the purpose is to identify any data corruption, crc32(using crc32_le API) looks okay to me. Any thoughts on this ?
May be we could add crc32 checks for trace data as well ?

Thanks.




>
>
>
>
>
>
>
> _______________________________________________
> CoreSight mailing list -- coresight@xxxxxxxxxxxxxxxx To unsubscribe send an
> email to coresight-leave@xxxxxxxxxxxxxxxx