Re: [PATCH] coresight: dynamic-replicator: Fix handling of multiple connections

From: Mike Leach
Date: Mon Apr 27 2020 - 05:46:10 EST


HI,

On Mon, 27 Apr 2020 at 10:15, Suzuki K Poulose <suzuki.poulose@xxxxxxx> wrote:
>
> On 04/26/2020 03:37 PM, Sai Prakash Ranjan wrote:
> > Since commit 30af4fb619e5 ("coresight: dynamic-replicator:
> > Handle multiple connections"), we do not make sure that
> > the other port is disabled when the dynamic replicator is
> > enabled. This is seen to cause the CPU hardlockup atleast
> > on SC7180 SoC with the following topology when enabling ETM
> > with ETR as the sink via sysfs. Since there is no trace id
> > logic in coresight yet to make use of multiple sinks in
> > parallel for different trace sessions, disable the other
> > port when one port is turned on.
> >
> > etm0_out
> > |
> > apss_funnel_in0
> > |
> > apss_merge_funnel_in
> > |
> > funnel1_in4
> > |
> > merge_funnel_in1
> > |
> > swao_funnel_in
> > |
> > etf_in
> > |
> > swao_replicator_in
> > |
> > replicator_in
> > |
> > etr_in
> >
> > Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
> > CPU: 7 PID: 0 Comm: swapper/7 Tainted: G S B 5.4.25 #100
> > Hardware name: Qualcomm Technologies, Inc. SC7180 IDP (DT)
> > Call trace:
> > dump_backtrace+0x0/0x188
> > show_stack+0x20/0x2c
> > dump_stack+0xdc/0x144
> > panic+0x168/0x370
> > arch_seccomp_spec_mitigate+0x0/0x14
> > watchdog_timer_fn+0x68/0x290
> > __hrtimer_run_queues+0x264/0x498
> > hrtimer_interrupt+0xf0/0x22c
> > arch_timer_handler_phys+0x40/0x50
> > handle_percpu_devid_irq+0x8c/0x158
> > __handle_domain_irq+0x84/0xc4
> > gic_handle_irq+0x100/0x1c4
> > el1_irq+0xbc/0x180
> > arch_cpu_idle+0x3c/0x5c
> > default_idle_call+0x1c/0x38
> > do_idle+0x100/0x280
> > cpu_startup_entry+0x24/0x28
> > secondary_start_kernel+0x15c/0x170
> > SMP: stopping secondary CPUs
> >
> > Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@xxxxxxxxxxxxxx>
> > Tested-by: Stephen Boyd <swboyd@xxxxxxxxxxxx>
> > ---
> > Changes since RFC:
> > * Reworded commit text and included the topology on SC7180.
>
>
> > ---
> > .../hwtracing/coresight/coresight-replicator.c | 15 ++++++++++-----
> > 1 file changed, 10 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/hwtracing/coresight/coresight-replicator.c b/drivers/hwtracing/coresight/coresight-replicator.c
> > index e7dc1c31d20d..f4eaa38f8f43 100644
> > --- a/drivers/hwtracing/coresight/coresight-replicator.c
> > +++ b/drivers/hwtracing/coresight/coresight-replicator.c
> > @@ -66,14 +66,16 @@ static int dynamic_replicator_enable(struct replicator_drvdata *drvdata,
> > int inport, int outport)
> > {
> > int rc = 0;
> > - u32 reg;
> > + u32 reg0, reg1;
> >
> > switch (outport) {
> > case 0:
> > - reg = REPLICATOR_IDFILTER0;
> > + reg0 = REPLICATOR_IDFILTER0;
> > + reg1 = REPLICATOR_IDFILTER1;
> > break;
> > case 1:
> > - reg = REPLICATOR_IDFILTER1;
> > + reg0 = REPLICATOR_IDFILTER1;
> > + reg1 = REPLICATOR_IDFILTER0;
> > break;
> > default:
> > WARN_ON(1);
> > @@ -87,8 +89,11 @@ static int dynamic_replicator_enable(struct replicator_drvdata *drvdata,
> > rc = coresight_claim_device_unlocked(drvdata->base);
> >
> > /* Ensure that the outport is enabled. */
> > - if (!rc)
> > - writel_relaxed(0x00, drvdata->base + reg);
> > + if (!rc) {
> > + writel_relaxed(0x00, drvdata->base + reg0);
> > + writel_relaxed(0xff, drvdata->base + reg1);
> > + }
> > +
>
> This is not sufficient. You must prevent another session trying to
> enable the other port of the replicator as this could silently fail
> the "on-going" session. Not ideal. Fail the attempt to enable a port
> if the other port is active. You could track this in software and
> fail early.
>
> Suzuki

While I have no issue in principle with not enabling a path to a sink
that is not in use - indeed in some cases attaching to unused sinks
can cause back-pressure that slows throughput (cf TPIU) - I am
concerned that this modification is masking an underlying issue with
the platform in question.

Should we decide to enable the diversion of different IDs to different
sinks or allow different sessions go to different sinks, then this has
potential to fail on the SC7180 SoC - and it will be difficult in
future to associate a problem with this discussion.

Regards

Mike




--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK