Re: [PATCH] Revert "arm64: dts: qcom: sa8540p-ride: enable pcie2a node"

From: Brian Masney
Date: Wed Jun 07 2023 - 11:31:04 EST


Hi Lucas,

On Fri, Jun 02, 2023 at 03:33:21PM -0400, Lucas Karpinski wrote:
> This reverts commit 2eb4cdcd5aba2db83f2111de1242721eeb659f71.

I am all for reverting this commit however I think your commit message
needs cleaned up.

> The patch introduced a sporadic error where the Qdrive3 will fail to
> boot occasionally due to an rcu preempt stall.
> Qualcomm has disabled pcie2a downstream:
> https://git.codelinaro.org/clo/la/platform/vendor/qcom-opensource/rh-patch/-/commit/447f2135909683d1385af36f95fae5e1d63a7e2f

Personally I'd remove the mention of the downstream kernel is this case.

Also your paragraphs are formatted weird with a newline at the end
of every sentence. Get them to flow together as a regular paragraph.
This is the relevant line that I have in my muttrc file to help.

set editor="vim -c 'set spell spelllang=en' -c 'set tw=72' -c 'set wrap'"

> rcu: INFO: rcu_preempt self-detected stall on CPU
> rcu: 0-....: (1 GPs behind) idle=77fc/1/0x4000000000000004 softirq=841/841 fqs=2476
> rcu: (t=5253 jiffies g=-175 q=2552 ncpus=8)
> Call trace:
> __do_softirq
> ____do_softirq
> call_on_irq_stack
> do_softirq_own_stack
> __irq_exit_rcu
> irq_exit_rcu
>
> The issue occurs normally once every 3-4 boot cycles.
> There is likely a race condition caused when setting up the two pcie
> domains concurrently (pcie2a and pcie3a).

I would also add that Qualcomm told us that upgrading the firmware on
the PCIe switch would correct this issue. We've upgraded the PCIe switch
to the latest firmware and this issue is still present. Apparently we
need to use a specific older version of the firmware that we can't get
from the PCIe switch vendor or Qualcomm.

Nothing is hooked up to pcie2a on the QDrive3 so there's no loss in
functionality by disabling this. We always have to remember to revert
this commit when working with an upstream kernel.

> This is not a solution, so this patch is disabling pcie2a as it seems
> Red Hat are the only ones working on the board,
> we're find with disabling the node until a root cause is found. If
> anyone has further suggestions for debugging, let me know.

This should go under the ---.

Brian