Re: Bug report: kernel paniced while booting

From: Jessica Clarke
Date: Mon Jun 05 2023 - 22:29:40 EST


On 6 Jun 2023, at 01:48, Icenowy Zheng <uwu@xxxxxxxxxx> wrote:
>
> 在 2023-06-05星期一的 13:55 -0700,Atish Patra写道:
>> On Mon, Jun 5, 2023 at 8:13 AM Sunil V L <sunilvl@xxxxxxxxxxxxxxxx>
>> wrote:
>>>
>>> On Mon, Jun 05, 2023 at 04:25:06PM +0200, Alexandre Ghiti wrote:
>>>> Hi Song,
>>>>
>>>> On Mon, Jun 5, 2023 at 12:52 PM Song Shuai
>>>> <songshuaishuai@xxxxxxxxxxx> wrote:
>>>>>
>>>>> Description of problem:
>>>>>
>>>>> Booting Linux With RiscVVirtQemu edk2 firmware, a Store/AMO
>>>>> page fault was trapped to trigger a kernel panic.
>>>>> The entire log has been posted at this link :
>>>>> https://termbin.com/nga4.
>>>>>
>>>>> You can reproduce it with the following step :
>>>>>
>>>>> 1. prepare the environment with
>>>>> - Qemu-virt: v8.0.0 (with OpenSbi v1.2)
>>>>> - edk2 : at commit (2bc8545883 "UefiCpuPkg/CpuPageTableLib:
>>>>> Reduce the number of random tests")
>>>>> - Linux : v6.4-rc1 and later version
>>>>>
>>>>> 2. start the Qemu virt board
>>>>>
>>>>> ```sh
>>>>> $ cat ~/8_riscv/start_latest.sh
>>>>> #!/bin/bash
>>>>> /home/song/8_riscv/3_acpi/qemu/ooo/usr/local/bin/qemu-system-
>>>>> riscv64 \
>>>>> -s -nographic -drive
>>>>> file=/home/song/8_riscv/3_acpi/Build_virt/RiscVVirtQemu/RELEASE
>>>>> _GCC5/FV/RISCV_VIRT.fd,if=pflash,format=raw,unit=1 \
>>>>> -machine virt,acpi=off -smp 2 -m 2G \
>>>>> -kernel
>>>>> /home/song/9_linux/linux/00_rv_def/arch/riscv/boot/Image \
>>>>> -initrd
>>>>> /home/song/8_riscv/3_acpi/buildroot/output/images/rootfs.ext2 \
>>>>> -append "root=/dev/ram ro console=ttyS0
>>>>> earlycon=uart8250,mmio,0x10000000 efi=debug loglevel=8
>>>>> memblock=debug" ## also panic by memtest
>>>>> ```
>>>>> 3. Then you will encounter the kernel panic logged in the above
>>>>> link
>>>>>
>>>>> Other Information:
>>>>>
>>>>> 1. -------
>>>>>
>>>>> This report is not identical to my prior report -- "kernel
>>>>> paniced when system hibernates" [1], but both of them
>>>>> are closely related with the commit (3335068f8721 "riscv: Use
>>>>> PUD/P4D/PGD pages for the linear mapping").
>>>>>
>>>>> With this commit, hibernation is trapped with "access fault"
>>>>> while accessing the PMP-protected regions
>>>>> (mmode_resv0@80000000)
>>>>> from OpenSbi (BTW, hibernation is marked as nonportable by
>>>>> Conor[2]).
>>>>>
>>>>> In this report, efi_init handoffs the memory mapping from Boot
>>>>> Services to memblock where reserves mmode_resv0@80000000,
>>>>> so there is no "access fault" but "page fault".
>>>>>
>>>>> And reverting commit 3335068f8721 indeed fixed this panic.
>>>>>
>>>>> 2. -------
>>>>>
>>>>> As the gdb-pt-dump [3] tool shows, the PTE which covered the
>>>>> fault virtual address had the appropriate permission to store.
>>>>> Is there another way to trigger the "Store/AMO page fault"? Or
>>>>> the creation of linear mapping in commit 3335068f8721 did
>>>>> something wrong?
>>>>>
>>>>> ```
>>>>> (gdb) p/x $satp
>>>>> $1 = 0xa000000000081708
>>>>> (gdb) pt -satp 0xa000000000081708
>>>>> Address : Length Permissions
>>>>> 0xff1bfffffea39000 : 0x1000 | W:1 X:0 R:1 S:1
>>>>> 0xff1bfffffebf9000 : 0x1000 | W:1 X:0 R:1 S:1
>>>>> 0xff1bfffffec00000 : 0x400000 | W:1 X:0 R:1 S:1
>>>>> 0xff60000000000000 : 0x1c0000 | W:1 X:0 R:1 S:1
>>>>> 0xff60000000200000 : 0xa00000 | W:0 X:0 R:1 S:1
>>>>> 0xff60000000c00000 : 0x7f000000 | W:1 X:0 R:1 S:1 //
>>>>> badaddr: ff6000007fdb1000
>>>>> 0xff6000007fdc0000 : 0x3d000 | W:1 X:0 R:1 S:1
>>>>> 0xff6000007ffbf000 : 0x1000 | W:1 X:0 R:1 S:1
>>>>> 0xffffffff80000000 : 0xc00000 | W:0 X:1 R:1 S:1
>>>>> 0xffffffff80c00000 : 0xa00000 | W:1 X:0 R:1 S:1
>>>>>
>>>>> ```
>>>>>
>>>>> 3. ------
>>>>>
>>>>> You can also reproduce similar panic by appending "memtest" in
>>>>> kernel cmdline.
>>>>> I have posted the memtest boot log at this link:
>>>>> https://termbin.com/1twl.
>>>>>
>>>>> Please correct me if I'm wrong.
>>>>>
>>>>> [1]:
>>>>> https://lore.kernel.org/linux-riscv/CAAYs2=gQvkhTeioMmqRDVGjdtNF_vhB+vm_1dHJxPNi75YDQ_Q@xxxxxxxxxxxxxx/
>>>>> [2]:
>>>>> https://lore.kernel.org/linux-riscv/20230526-astride-detonator-9ae120051159@wendy/
>>>>> [3]: https://github.com/martinradev/gdb-pt-dump
>>>>
>>>> Thanks for the thorough report, really appreciated.
>>>>
>>>> So there are multiple issues here:
>>>>
>>>> - the first one is that the memory region for opensbi is marked
>>>> as not
>>>> cacheable in the efi memory map, and then this region is not
>>>> mapped in
>>>> the linear mapping:
>>>> [ 0.000000] efi: 0x000080000000-0x00008003ffff [Reserved
>>>> | |
>>>> | | | | | | | | | | | |UC]
>>>>
>>
>> @Alex: The OpenSBI region is marked reserved because EDK2 chooses to
>> do that explicitly as explained by Sunil.
>> I don't think UC has to do anything with it. All the EFI memory
>> regions are UC.
>>
>>>> - the second one (that I feel a bit ashamed of...) is that I did
>>>> not
>>>> check the alignment of the virtual address when choosing the map
>>>> size
>>>> in best_map_size() and then we end up trying to map a physical
>>>> region
>>>> aligned on 2MB that is actually not aligned on 2MB virtually
>>>> because
>>>> the opensbi region is not mapped at all.
>>>>
>>>> - the possible third one is that we should not map the linear
>>>> mapping
>>>> using 4K pages, this would be slow in my opinion, and I think we
>>>> should waste a bit of memory to align va and pa on a 2MB
>>>> boundary.
>>>>
>>>> So I'll fix the second issue, and possibly the third one, and if
>>>> no
>>>> one looks into why the opensbi region is mapped in UC, I'll take
>>>> a
>>>> look at edk2.
>>>>
>>> Hi Alex,
>>>
>>> EDK2 marks opensbi range as reserved memory in EFI map. According
>>> to DT
>>> spec, if the no-map is not set, we need to mark it as
>>> EfiBootServicesData but EfiBootServicesData is actually considered
>>> as
>>> free memory in kernel, as per UEFI spec. To avoid kernel using this
>>> memory, we deviated from the DT spec for opensbi ranges.
>>>
>>
>> IMO, that should be the correct way unless we can change it to
>> EfiRunServicesData/Code.
>> Looking at U-Boot code, it sets the no-map region to
>> EfiBootServicesData which may cause
>> issues in RISC-V as well if we linear mapping sets up the initial
>> 2MB.
>
> Semantically I think no-map means the kernel should not be utilizing
> it, so it should be EfiRunServicesData instead.

no-map *is* EfiRuntimeServicesData in the DT spec; the problem is that
OpenSBI isn’t marking it no-map, but has nothing in its place to mark
it as “you can map it but you really cannot touch it”; normally
reserved memory is “don’t treat it as normal memory but you can use it
for specific purposes (e.g. DMA memory pool)”, so it gets treated as
the latter (what that state has always meant) despite definitely not
being that. Or, to put it another way, OpenSBI went from something that
was correct-but-slow to something that is incorrect-but-fast and the
consumers of the FDT are now paying the price for its brokenness. So we
need some other approach that isn’t incorrect by definition.

Jess

>>
>>
>>> Let me know your thoughts how we can handle this better in EDK2
>>> considering it has to support ACPI also.
>>>
>>> Thanks,
>>> Sunil
>>>
>>> _______________________________________________
>>> linux-riscv mailing list
>>> linux-riscv@xxxxxxxxxxxxxxxxxxx
>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>
>>
>>
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/linux-riscv