Re: ARM BCM53573 SoC hangs/lockups caused by locks/clock/random changes

From: Florian Fainelli
Date: Wed Nov 29 2023 - 16:42:47 EST


On 11/29/23 13:33, Linus Walleij wrote:
On Wed, Nov 29, 2023 at 10:20 PM Rafał Miłecki <zajec5@xxxxxxxxx> wrote:

Here comes more interesting experiment though. Putting there:

if (!(foo++ % 10000)) {
pr_info("[%s] arm_pm_idle:%ps\n", __func__, arm_pm_idle);
}

doesn't seem to help.


Putting following however seems to make kernel/device stable:

if (!(foo++ % 100)) {
pr_info("[%s] arm_pm_idle:%ps\n", __func__, arm_pm_idle);
}

That's just too weird.

It does seem to indicate that idling for too long wrecks havoc, but it is indeed not making much sense. Not having proper documentation for this SoC, it is hard to figure out what impact does stopping the ARM CPU clock has on the rest of the memory subsystem, especially outside of the CPU. I do not believe that this SoC has any form of PLL clock gating or pulse skipping.


I think I'm just going to assume those chipsets are simply hw broken.

If disabling CPU idle on these altogether stabilize them, then maybe that
is what we need to do?

Yes, please try booting with "nohlt" set on the kernel command line and see how that fares.

Also useful would be to dump the L2 CTLR and L2 ECTLR, this is a complete shot in the dark, though was initially wondering if there could be some retention issues, and would have recommended disabling the L2 retention policy completely just for testing.

MRC p15, 1, <Rt>, c9, c0, 2;

of particular interest here would be bit at position 0, try to see if changing it to 1 (3 cycles) or 0 (2 cycles) changes anything.

MRC p15, 1, <Rt>, c9, c0, 3;

the lower bits are reserved, so I would not necessarily expect them to be mapping to configurable latencies, but if you see non-zero values in bits [28:0], try changing them to 0 and see if that changes anything.

Thanks for your persistence!
--
Florian

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature