Re: powerpc: several early boot regressions on MPC52xx

From: Christophe Leroy
Date: Thu Dec 21 2023 - 06:17:59 EST




Le 21/12/2023 à 11:33, Matthias Schiffer a écrit :
> [Vous ne recevez pas souvent de courriers de matthias.schiffer@xxxxxxxxxxxxxxx. Découvrez pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ]
>
> On Wed, 2023-12-20 at 14:55 +0000, Christophe Leroy wrote:
>>> Le 19/12/2023 à 14:34, Matthias Schiffer a écrit :
>>>>> [Vous ne recevez pas souvent de courriers de matthias.schiffer@xxxxxxxxxxxxxxx. Découvrez pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ]
>>>>>
>>>>> On Mon, 2023-12-18 at 19:48 +0000, Christophe Leroy wrote:
>>>>>>> Hi Matthias,
>>>>>>>
>>>>>>> Le 18/12/2023 à 14:48, Matthias Schiffer a écrit :
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I'm currently in the process of porting our ancient TQM5200 SoM to a modern kernel, and I've
>>>>>>>>> identified a number of regressions that cause early boot failures (before the UART console has been
>>>>>>>>> initialized) with arch/powerpc/configs/52xx/tqm5200_defconfig.
>>>>>>>
>>>>>>> "modern" kernel ==> which version ?
>>>>>
>>>>> Hi Christophe,
>>>>>
>>>>> I was testing with torvalds/master as of yesterday, and bisected everything from 4.14 to identify
>>>>> the commits related to the issues. For my current project, 6.1.y or 6.6.y will likely be our kernel
>>>>> of choice, but I'd also like to get mainline to a working state again if possible.
>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> Issue 1) Boot fails with CONFIG_PPC_KUAP enabled (enabled by default since 9f5bd8f1471d7
>>>>>>>>> "powerpc/32s: Activate KUAP and KUEP by default"). The reason is a number of of_iomap() calls in
>>>>>>>>> arch/powerpc/platforms/52xx that should be early_ioremap().
>>>>>>>
>>>>>>> Can you give more details and what leads you to that conclusion ?
>>>>>>>
>>>>>>> There should be no relation between KUAP and of_iomap()/early_ioremap().
>>>>>>> Problem is likely somewhere else.
>>>>>
>>>>> You are entirely right, the warnings about early_ioremap() were a red hering. I can't reproduce any
>>>>> difference in boot behavior anymore I thought I was seeing when changing the of_iomap() to
>>>>> early_ioremap(). I assume I got confused by testing for too many variables at once (kernel version +
>>>>> 2 Kconfig settings).
>>>>>
>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> I can fix this up easy enough for mpc5200_simple by changing mpc5200_setup_xlb_arbiter() to use
>>>>>>>>> early_ioremap() and moving mpc52xx_map_common_devices() from the setup_arch to the init hook (one
>>>>>>>>> side effect is that mpc52xx_restart() only works a bit later, as it requires the mpc52xx_wdt mapping
>>>>>>>>> from mpc52xx_map_common_devices(); I'm not sure if there is a better solution).
>>>>>>>>>
>>>>>>>>> For the other 52xx platforms (efika, lite5200, media5200) things are a bit more chaotic, and they
>>>>>>>>> create several more temporary mappings from setup_arch. Either they should all be moved to the init
>>>>>>>>> hook as well, or be converted to early_ioremap(), but I can't tell which is more appropriate. As a
>>>>>>>>> first step, I would propose a patch that fixes this for the simple platforms and leaves the other
>>>>>>>>> ones unchanged.
>>>>>>>>>
>>>>>>>>> (Side note: I also found that before 16132529cee58 ("powerpc/32s: Rework Kernel Userspace Access
>>>>>>>>> Protection"), boot would succeed even with KUAP enabled without changing the incorrect of_iomap(); I
>>>>>>>>> guess the old implementation was more lenient about the incorrect calls that the kernel warns
>>>>>>>>> about?)
>>>>>>>
>>>>>>> Interesting.
>>>>>>> Again, there shouldn't be any impact of those incorrect calls. They are
>>>>>>> correct calls, it is just an historical method that we want to get rid
>>>>>>> of on day.
>>>>>>> Could you then provide the dmesg of what/how it works here ? And then
>>>>>>> I'd also be interested in a dump of /sys/kernel/debug/kernel_page_tables
>>>>>>> and /sys/kernel/debug/powerpc/block_address_translation
>>>>>>> and /sys/kernel/debug/powerpc/segment_registers
>>>>>>>
>>>>>>> For that you'll need CONFIG_PTDUMP_DEBUGFS
>>>>>
>>>>> As it turns out, whatever issue existed with KUAP at the time when it was changed to enabled by
>>>>> default for 32s (which was in 5.14) has been resolved in current mainline. Current torvalds/master
>>>>> boots fine with KUAP enabled, and only CONFIG_STRICT_KERNEL_RWX breaks the boot.
>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> Issue 2) Boot fails with CONFIG_STRICT_KERNEL_RWX enabled, which is also the default nowadays.
>>>>>>>>>
>>>>>>>>> I have not found the cause of this boot failure yet; is there any way to debug this if the failure
>>>>>>>>> happens before the UART is available and I currently don't have JTAG for this hardware?
>>>>>>>
>>>>>>> Shouldn't happen before UART is available, strict enforcement is
>>>>>>> perfomed by mark_readonly() and free_initmem() in the middle of
>>>>>>> kernel_init(). UART should be ON long before that.
>>>>>>>
>>>>>>> So it must be something in the setup that collides with CONFIG_KUAP and
>>>>>>> CONFIG_STRICT_KERNEL_RWX.
>>>>>>>
>>>>>>> Could you send dmesg of when it works (ie without
>>>>>>> CONFIG_KUAP/CONFIG_STRICT_KERNEL_RWX) and when it doesn't work if you
>>>>>>> get some initial stuff ?
>>>>>
>>>>> Here's the UART output of a working boot (CONFIG_STRICT_KERNEL_RWX disabled; I have slightly
>>>>> extended tqm5200.dts to enable UART output of the cuImage wrapper):
>>>>>
>>> ...
>>>>>
>>>>> When boot doesn't work, the last messages I see are from the cuImage wrapper ("Finalizing device
>>>>> tree... flat tree at ...). The panic is expected, there is no rootfs/initramfs in my current setup.
>>>>>
>>>
>>> Ok, so let's focus on CONFIG_STRICT_KERNEL_RWX then.
>>>
>>> The most efficient would be if you were able to activation your UART
>>> console earlier and/or implement some PPC_EARLY_DEBUG stuff to see where
>>> it fails.
>>>
>>> In your dmesg output, "Kernel memory protection not selected by kernel
>>> config" is when the strict RWX gets activated when selected. Your UART
>>> is enabled before that so if there was a problem with some driver
>>> writing in a RO area, it would be seen.
>>>
>>> One thing that came into my mind is that your CPU may have only 4 BATs
>>> instead of 8. But I hacked the definition for the e300c2 CPU and my
>>> board still boots with only 4 BATs so it is not that.
>>> The thing is that to work properly, BATs should at least cover all
>>> kernel. But I built your kernel with your .config and GCC 11.3 and I got
>>> something that fits within 8M with the RO part stopping at 4M, so you'll
>>> have one 4M BAT set RO, then another 4M BAT set RW, one 8M and one 16M
>>> BAT. It won't cover your entire 128M memory but shouldn't be a problem,
>>> just less performant.
>
> Hi Christophe,
>
> this seems indeed have something to do with the issue. mmu_mapin_ram() contains a
> strict_kernel_rwx_enabled() check that explains the early boot failure (and as this is a runtime
> check, I can actually make the kernel boot by passing rodata=off on the cmdline!). I've added debug
> output show me the addresses in mmu_mapin_ram(): base=00000000 top=08000000 border=00400000.
> Modifying mmu_mapin_ram() to always use the !strict_kernel_rwx_enabled() path makes the kernel boot
> until mark_readonly().
>
> Removing MMU_FTR_USE_HIGH_BATS from mmu_features or changing find_free_bat() to only use 4 BATs
> regardless of MMU_FTR_USE_HIGH_BATS results in a working kernel, but it is unclear to me why that
> would be necessary, as the MPC5200B manual clearly states that it has 8.
>

Maybe just because something is wrong in the way we set-up BATs.

Would be great if you could debug print with a bit more details what
happens inside __mmu_mapin_ram(), that is idx and base and size for
every call to setbat().

Maybe you can boot to a later point by removing the LOAD_BAT() in
function load_up_mmu() in arch/powerpc/kernel/head_book3s_32.S if it
helps you print stuff. Then maybe you can print and check the contents
of the BATS table.

Christophe