Re: powerpc: several early boot regressions on MPC52xx

From: Christophe Leroy
Date: Wed Dec 20 2023 - 09:55:37 EST




Le 19/12/2023 à 14:34, Matthias Schiffer a écrit :
> [Vous ne recevez pas souvent de courriers de matthias.schiffer@xxxxxxxxxxxxxxx. Découvrez pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ]
>
> On Mon, 2023-12-18 at 19:48 +0000, Christophe Leroy wrote:
>> ********************
>> Achtung externe E-Mail: Öffnen Sie Anhänge und Links nur, wenn Sie wissen, dass diese aus einer sicheren Quelle stammen und sicher sind. Leiten Sie die E-Mail im Zweifelsfall zur Prüfung an den IT-Helpdesk weiter.
>> Attention external email: Open attachments and links only if you know that they are from a secure source and are safe. In doubt forward the email to the IT-Helpdesk to check it.
>> ********************
>>
>> Hi Matthias,
>>
>> Le 18/12/2023 à 14:48, Matthias Schiffer a écrit :
>>> Hi all,
>>>
>>> I'm currently in the process of porting our ancient TQM5200 SoM to a modern kernel, and I've
>>> identified a number of regressions that cause early boot failures (before the UART console has been
>>> initialized) with arch/powerpc/configs/52xx/tqm5200_defconfig.
>>
>> "modern" kernel ==> which version ?
>
> Hi Christophe,
>
> I was testing with torvalds/master as of yesterday, and bisected everything from 4.14 to identify
> the commits related to the issues. For my current project, 6.1.y or 6.6.y will likely be our kernel
> of choice, but I'd also like to get mainline to a working state again if possible.
>
>>
>>>
>>> Issue 1) Boot fails with CONFIG_PPC_KUAP enabled (enabled by default since 9f5bd8f1471d7
>>> "powerpc/32s: Activate KUAP and KUEP by default"). The reason is a number of of_iomap() calls in
>>> arch/powerpc/platforms/52xx that should be early_ioremap().
>>
>> Can you give more details and what leads you to that conclusion ?
>>
>> There should be no relation between KUAP and of_iomap()/early_ioremap().
>> Problem is likely somewhere else.
>
> You are entirely right, the warnings about early_ioremap() were a red hering. I can't reproduce any
> difference in boot behavior anymore I thought I was seeing when changing the of_iomap() to
> early_ioremap(). I assume I got confused by testing for too many variables at once (kernel version +
> 2 Kconfig settings).
>
>
>>
>>>
>>> I can fix this up easy enough for mpc5200_simple by changing mpc5200_setup_xlb_arbiter() to use
>>> early_ioremap() and moving mpc52xx_map_common_devices() from the setup_arch to the init hook (one
>>> side effect is that mpc52xx_restart() only works a bit later, as it requires the mpc52xx_wdt mapping
>>> from mpc52xx_map_common_devices(); I'm not sure if there is a better solution).
>>>
>>> For the other 52xx platforms (efika, lite5200, media5200) things are a bit more chaotic, and they
>>> create several more temporary mappings from setup_arch. Either they should all be moved to the init
>>> hook as well, or be converted to early_ioremap(), but I can't tell which is more appropriate. As a
>>> first step, I would propose a patch that fixes this for the simple platforms and leaves the other
>>> ones unchanged.
>>>
>>> (Side note: I also found that before 16132529cee58 ("powerpc/32s: Rework Kernel Userspace Access
>>> Protection"), boot would succeed even with KUAP enabled without changing the incorrect of_iomap(); I
>>> guess the old implementation was more lenient about the incorrect calls that the kernel warns
>>> about?)
>>
>> Interesting.
>> Again, there shouldn't be any impact of those incorrect calls. They are
>> correct calls, it is just an historical method that we want to get rid
>> of on day.
>> Could you then provide the dmesg of what/how it works here ? And then
>> I'd also be interested in a dump of /sys/kernel/debug/kernel_page_tables
>> and /sys/kernel/debug/powerpc/block_address_translation
>> and /sys/kernel/debug/powerpc/segment_registers
>>
>> For that you'll need CONFIG_PTDUMP_DEBUGFS
>
> As it turns out, whatever issue existed with KUAP at the time when it was changed to enabled by
> default for 32s (which was in 5.14) has been resolved in current mainline. Current torvalds/master
> boots fine with KUAP enabled, and only CONFIG_STRICT_KERNEL_RWX breaks the boot.
>
>>
>>>
>>> Issue 2) Boot fails with CONFIG_STRICT_KERNEL_RWX enabled, which is also the default nowadays.
>>>
>>> I have not found the cause of this boot failure yet; is there any way to debug this if the failure
>>> happens before the UART is available and I currently don't have JTAG for this hardware?
>>
>> Shouldn't happen before UART is available, strict enforcement is
>> perfomed by mark_readonly() and free_initmem() in the middle of
>> kernel_init(). UART should be ON long before that.
>>
>> So it must be something in the setup that collides with CONFIG_KUAP and
>> CONFIG_STRICT_KERNEL_RWX.
>>
>> Could you send dmesg of when it works (ie without
>> CONFIG_KUAP/CONFIG_STRICT_KERNEL_RWX) and when it doesn't work if you
>> get some initial stuff ?
>
> Here's the UART output of a working boot (CONFIG_STRICT_KERNEL_RWX disabled; I have slightly
> extended tqm5200.dts to enable UART output of the cuImage wrapper):
>
...
>
> When boot doesn't work, the last messages I see are from the cuImage wrapper ("Finalizing device
> tree... flat tree at ...). The panic is expected, there is no rootfs/initramfs in my current setup.
>

Ok, so let's focus on CONFIG_STRICT_KERNEL_RWX then.

The most efficient would be if you were able to activation your UART
console earlier and/or implement some PPC_EARLY_DEBUG stuff to see where
it fails.

In your dmesg output, "Kernel memory protection not selected by kernel
config" is when the strict RWX gets activated when selected. Your UART
is enabled before that so if there was a problem with some driver
writing in a RO area, it would be seen.

One thing that came into my mind is that your CPU may have only 4 BATs
instead of 8. But I hacked the definition for the e300c2 CPU and my
board still boots with only 4 BATs so it is not that.
The thing is that to work properly, BATs should at least cover all
kernel. But I built your kernel with your .config and GCC 11.3 and I got
something that fits within 8M with the RO part stopping at 4M, so you'll
have one 4M BAT set RO, then another 4M BAT set RW, one 8M and one 16M
BAT. It won't cover your entire 128M memory but shouldn't be a problem,
just less performant.

So what ? You said the size of the kernel is a problem for your
bootloader. Could it be that ? When built with CONFIG_STRICT_KERNEL_RWX,
__end_rodata is aligned to 0xc0400000 whereas without
CONFIG_STRICT_KERNEL_RWX __end_rodata is at 0xc038c000 and so the end of
the kernel (seen from System.map) is 0xc0055e000 with
CONFIG_STRICT_KERNEL_RWX and 0xc04de000 without it.

One thing you can try is to see if it works without
CONFIG_STRICT_KERNEL_RWX but with CONFIG_DATA_SHIFT forced to 22 which
is the value set when CONFIG_STRICT_KERNEL_RWX is selected.
To be able to set that value, you'll have to hack arch/powerpc/Kconfig
directly and force it to select value 22 regardless of
CONFIG_STRICT_KERNEL_RWX

Christophe