Re: PPC476 hangs during tlb flush after calling /init in crash kernel with linux 5.4+

From: Christophe Leroy
Date: Wed Apr 28 2021 - 02:08:26 EST




Le 28/04/2021 à 00:42, Eddie James a écrit :
On Tue, 2021-04-27 at 19:26 +0200, Christophe Leroy wrote:
Hi Eddies,

Le 27/04/2021 à 19:03, Eddie James a écrit :
Hi all,

I'm having a problem in simulation and hardware where my PPC476
processor stops executing instructions after callling /init. In my
case
this is a bash script. The code descends to flush the TLB, and
somewhere in the loop in _tlbil_pid, the PC goes to
InstructionTLBError47x but does not go any further. This only
occurs in
the crash kernel environment, which is using the same kernel,
initramfs, and init script as the main kernel, which executed fine.
I
do not see this problem with linux 4.19 or 3.10. I do see it with
5.4
and 5.10. I see a fair amount of refactoring in the PPC memory
management area between 4.19 and 5.4. Can anyone point me in a
direction to debug this further? My stack trace is below as I can
run
gdb in simulation.

Can you bisect to pin point the culprit commit ?

Hi, thanks for your prompt reply.

Good idea! I have bisected to:

commit 9e849f231c3c72d4c3c1b07c9cd19ae789da0420 (b8-bad,
refs/bisect/bad)
Author: Christophe Leroy <christophe.leroy@xxxxxx>
Date: Thu Feb 21 19:08:40 2019 +0000

powerpc/mm/32s: use generic mmu_mapin_ram() for all blocks.
Now that mmu_mapin_ram() is able to handle other blocks
than the one starting at 0, the WII can use it for all
its blocks.
Signed-off-by: Christophe Leroy <christophe.leroy@xxxxxx>
Signed-off-by: Michael Ellerman <mpe@xxxxxxxxxxxxxx>

I also confirmed that reverting this commit resolves the issue in 5.4+.

Now, I don't understand why this is problematic or what is really
happening... Reverting is probably not the desired solution.


Can you provide the 'dmesg' or a dump of the logs printed by the kernel at boottime ?

The difference with this commit is that if there are several memblocks, all get mapped. Maybe your target doesn't like it.

You are talking about simulation, are you using QEMU ? If yes can you provide details so that I can try and reproduce the issue ?

Thanks
Christophe