Re: ARM BCM53573 SoC hangs/lockups caused by locks/clock/random changes

From: Rafał Miłecki
Date: Wed Nov 29 2023 - 16:20:53 EST


Hi,

it's a late reply but I didn't find enough determination earlier.

On 8.09.2023 10:10, Linus Walleij wrote:
On Mon, Sep 4, 2023 at 10:34 AM Rafał Miłecki <zajec5@xxxxxxxxx> wrote:

I'm clueless at this point.
Maybe someone can come up with an idea of actual issue & ideally a
solution.

Damn this is frustrating.

2. Clock (arm,armv7-timer)

While comparing main clock in Broadcom's SDK with upstream one I noticed
a tiny difference: mask value. I don't know it it makes any sense but
switching from CLOCKSOURCE_MASK(56) to CLOCKSOURCE_MASK(64) in
arm_arch_timer.c (to match SDK) increases average uptime (time before a
hang/lockup happens) from 4 minutes to 36 minutes.

This could be related to how often the system goes to idle.

+ if (cpu_idle_force_poll == 1234)
+ arch_cpu_idle();
+ if (cpu_idle_force_poll == 5678)
+ arch_cpu_idle();
+ if (cpu_idle_force_poll == 1234)
+ arch_cpu_idle();
+ if (cpu_idle_force_poll == 5678)
+ arch_cpu_idle();
+ if (cpu_idle_force_poll == 1234)
+ arch_cpu_idle();
+ if (cpu_idle_force_poll == 5678)
+ arch_cpu_idle();
+ if (cpu_idle_force_poll == 1234)
+ arch_cpu_idle();

Idle again.

I would have tried to see what arch_cpu_idle() is doing.

arm_pm_idle() or cpu_do_idle()?

In my case arm_pm_idle is NULL.


What happens if you just put return in arch_cpu_idle()
so it does nothing?

Doesn't help. I also tried putting:
udelay(10);
and
udelay(1000);
at the arch_cpu_idle() beginning. None helped.


Here comes more interesting experiment though. Putting there:

if (!(foo++ % 10000)) {
pr_info("[%s] arm_pm_idle:%ps\n", __func__, arm_pm_idle);
}

doesn't seem to help.


Putting following however seems to make kernel/device stable:

if (!(foo++ % 100)) {
pr_info("[%s] arm_pm_idle:%ps\n", __func__, arm_pm_idle);
}


I think I'm just going to assume those chipsets are simply hw broken.