Re: [PATCH] coresight: dynamic-replicator: Fix handling of multiple connections

From: Mike Leach
Date: Wed Apr 29 2020 - 10:28:07 EST


Hi,

On Wed, 29 Apr 2020 at 14:59, Sai Prakash Ranjan
<saiprakash.ranjan@xxxxxxxxxxxxxx> wrote:
>
> On 2020-04-29 19:19, Suzuki K Poulose wrote:
> > On 04/29/2020 12:47 PM, Sai Prakash Ranjan wrote:
> >> On 2020-04-28 17:53, Sai Prakash Ranjan wrote:
> >>> On 2020-04-27 19:23, Suzuki K Poulose wrote:
> >>>> On 04/27/2020 10:45 AM, Mike Leach wrote:
> >>> [...]
> >>>>>>
> >>>>>> This is not sufficient. You must prevent another session trying to
> >>>>>> enable the other port of the replicator as this could silently
> >>>>>> fail
> >>>>>> the "on-going" session. Not ideal. Fail the attempt to enable a
> >>>>>> port
> >>>>>> if the other port is active. You could track this in software and
> >>>>>> fail early.
> >>>>>>
> >>>>>> Suzuki
> >>>>>
> >>>>> While I have no issue in principle with not enabling a path to a
> >>>>> sink
> >>>>> that is not in use - indeed in some cases attaching to unused sinks
> >>>>> can cause back-pressure that slows throughput (cf TPIU) - I am
> >>>>> concerned that this modification is masking an underlying issue
> >>>>> with
> >>>>> the platform in question.
> >>>>>
> >>>>> Should we decide to enable the diversion of different IDs to
> >>>>> different
> >>>>> sinks or allow different sessions go to different sinks, then this
> >>>>> has
> >>>>> potential to fail on the SC7180 SoC - and it will be difficult in
> >>>>> future to associate a problem with this discussion.
> >>>>
> >>>> Mike,
> >>>>
> >>>> I think thats a good point.
> >>>> Sai, please could we narrow down this to the real problem and may be
> >>>> work around it for the "device" ? Do we know which sink is causing
> >>>> the
> >>>> back pressure ? We could then push the "work around" to the
> >>>> replicator
> >>>> it is connected to.
> >>>>
> >>>> Suzuki
> >>>
> >>> Hi Suzuki, Mike,
> >>>
> >>> To add some more to the information provided earlier,
> >>> swao_replicator(6b06000) and etf are
> >>> in AOSS (Always-On-SubSystem) group. Also TPIU(connected to
> >>> qdss_replicator) and EUD(connected
> >>> to swao_replicator) sinks are unused.
> >>>
> >>> Please ignore the id filter values provided earlier.
> >>> Here are ID filter values after boot and before enabling replicator.
> >>> As per
> >>> these idfilter values, we should not try to enable replicator if its
> >>> already
> >>> enabled (in this case for swao_replicator) right?
> >>>
> >>> localhost ~ # cat
> >>> /sys/bus/amba/devices/6b06000.replicator/replicator1/mgmt/idfilter0
> >>> 0x0
> >>> localhost ~ # cat
> >>> /sys/bus/amba/devices/6b06000.replicator/replicator1/mgmt/idfilter1
> >>> 0x0
> >>>
> >>> localhost ~ # cat
> >>> /sys/bus/amba/devices/6046000.replicator/replicator0/mgmt/idfilter0
> >>> 0xff
> >>> localhost ~ # cat
> >>> /sys/bus/amba/devices/6046000.replicator/replicator0/mgmt/idfilter1
> >>> 0xff
> >>>
> >>
> >> Looking more into replicator1(swao_replicator) values as 0x0 even
> >> after replicator_reset()
> >> in replicator probe, I added dynamic_replicator_reset in
> >> dynamic_replicator_enable()
> >> and am not seeing any hardlockup. Also I added some prints to check
> >> the idfilter
> >> values before and after reset and found that its not set to 0xff even
> >> after replicator_reset()
> >> in replicator probe, I don't see any other path setting it to 0x0.
> >>
> >> After probe:
> >>
> >> [ 8.477669] func replicator_probe before reset replicator
> >> replicator1 REPLICATOR_IDFILTER0=0x0 REPLICATOR_IDFILTER1=0x0
> >> [ 8.489470] func replicator_probe after reset replicator
> >> replicator1 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
> >
> > AFAICS, after the reset both of them are set to 0xff.
>
> Yes I see this too as we call replicator_reset() in probe. What I wanted
> to highlight was the below part where it is set to 0x0 before enabling
> dynamic replicator.
>
> >
> >> [ 8.502738] func replicator_probe before reset replicator
> >> replicator0 REPLICATOR_IDFILTER0=0x0 REPLICATOR_IDFILTER1=0x0
> >> [ 8.515214] func replicator_probe after reset replicator
> >> replicator0 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
> >
> >
> >
> >> localhost ~ #
> >> localhost ~ #
> >> localhost ~ # echo 1 > /sys/bus/coresight/devices/tmc_etr0/enable_sink
> >> localhost ~ #
> >> localhost ~ # echo 1 > /sys/bus/coresight/devices/etm0/enable_source
> >> [ 58.490485] func dynamic_replicator_enable before reset replicator
> >> replicator0 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
> >> [ 58.503246] func dynamic_replicator_enable after reset replicator
> >> replicator0 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
> >> [ 58.520902] func dynamic_replicator_enable before reset replicator
> >> replicator1 REPLICATOR_IDFILTER0=0x0 REPLICATOR_IDFILTER1=0x0
> >
> > You need to find what is resetting the IDFILTERs to 0 for replicator1.
> >
>
> That is right.
>

By default all replicators have the IDFILTER registers set to 0 out of
hardware reset. This ensures that programmable replicators behave in
the same way as non-programmable replicators out of reset.

The dynamic_replicator_reset() is of course a driver state reset -
which filters out all trace on the output ports. The trace is then
enabled when we set the trace path from source to sink.

It seems to me that you have 2 problems that need solving here:

1) Why does the reset_replicator() called from probe() _not_ work
correctly on replicator 1? It seems to work later if you introduce a
reset after more of the system has powered and booted. This is
startiing to look a little like a PM / clocking issue.

This failure is causing the state when we are trying to set an output
port that both branches of this replicator are enabled for output.
In effect for this replicator, setting the output port has no effect
as it is already enabled.

2) Why does having both ports of this repilicator enabled cause a hard
lockup? This is a separate hardware / system issue.

The worst that should happen if both branches of a replicator are
enabled is that you get undesirable back pressure. (e.g. there is a
system we have seen - I think it is Juno - where there is a static
replicator feeding the TPIU and ETR - we need to disable the TPIU to
prevent undesired back pressure).

Regards

Mike


> >> [ 58.533500] func dynamic_replicator_enable after reset replicator
> >> replicator1 REPLICATOR_IDFILTER0=0xff REPLICATOR_IDFILTER1=0xff
> >> localhost ~ #
> >>
> >> Can we have a replicator_reset in dynamic_replicator_enable?
> >>
> >> diff --git a/drivers/hwtracing/coresight/coresight-replicator.c
> >> b/drivers/hwtracing/coresight/coresight-replicator.c
> >> index e7dc1c31d20d..794f8e4c049f 100644
> >> --- a/drivers/hwtracing/coresight/coresight-replicator.c
> >> +++ b/drivers/hwtracing/coresight/coresight-replicator.c
> >> @@ -68,6 +68,8 @@ static int dynamic_replicator_enable(struct
> >> replicator_drvdata *drvdata,
> >> int rc = 0;
> >> u32 reg;
> >>
> >> + dynamic_replicator_reset(drvdata);
> >> +
> >
> > Again you are trying to mask an issue with this. Is the firmware
> > using the replicator for anything ? If so, this needs to be claimed
> > to prevent us from using it.
> >
>
> I was trying to narrow down further as you suggested. There are other
> ETMs like AOP ETM which use this replicator, will need to check with the
> firmware team for details.
>
> Thanks,
> Sai
>
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> member
> of Code Aurora Forum, hosted by The Linux Foundation



--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK