Re: Bug report: kernel paniced while booting

From: Atish Patra
Date: Tue Jun 06 2023 - 14:22:21 EST


On Tue, Jun 6, 2023 at 12:26 AM Alexandre Ghiti <alex@xxxxxxxx> wrote:
>
>
> On 06/06/2023 08:40, Sunil V L wrote:
> > On Mon, Jun 05, 2023 at 10:42:33PM +0100, Jessica Clarke wrote:
> >> On 5 Jun 2023, at 16:12, Sunil V L <sunilvl@xxxxxxxxxxxxxxxx> wrote:
> >>> On Mon, Jun 05, 2023 at 04:25:06PM +0200, Alexandre Ghiti wrote:
> >>>> Hi Song,
> >>>>
> >>>> On Mon, Jun 5, 2023 at 12:52 PM Song Shuai <songshuaishuai@xxxxxxxxxxx> wrote:
> >>>>> Description of problem:
> >>>>>
> >>>>> Booting Linux With RiscVVirtQemu edk2 firmware, a Store/AMO page fault was trapped to trigger a kernel panic.
> >>>>> The entire log has been posted at this link : https://termbin.com/nga4.
> >>>>>
> >>>>> You can reproduce it with the following step :
> >>>>>
> >>>>> 1. prepare the environment with
> >>>>> - Qemu-virt: v8.0.0 (with OpenSbi v1.2)
> >>>>> - edk2 : at commit (2bc8545883 "UefiCpuPkg/CpuPageTableLib: Reduce the number of random tests")
> >>>>> - Linux : v6.4-rc1 and later version
> >>>>>
> >>>>> 2. start the Qemu virt board
> >>>>>
> >>>>> ```sh
> >>>>> $ cat ~/8_riscv/start_latest.sh
> >>>>> #!/bin/bash
> >>>>> /home/song/8_riscv/3_acpi/qemu/ooo/usr/local/bin/qemu-system-riscv64 \
> >>>>> -s -nographic -drive file=/home/song/8_riscv/3_acpi/Build_virt/RiscVVirtQemu/RELEASE_GCC5/FV/RISCV_VIRT.fd,if=pflash,format=raw,unit=1 \
> >>>>> -machine virt,acpi=off -smp 2 -m 2G \
> >>>>> -kernel /home/song/9_linux/linux/00_rv_def/arch/riscv/boot/Image \
> >>>>> -initrd /home/song/8_riscv/3_acpi/buildroot/output/images/rootfs.ext2 \
> >>>>> -append "root=/dev/ram ro console=ttyS0 earlycon=uart8250,mmio,0x10000000 efi=debug loglevel=8 memblock=debug" ## also panic by memtest
> >>>>> ```
> >>>>> 3. Then you will encounter the kernel panic logged in the above link
> >>>>>
> >>>>> Other Information:
> >>>>>
> >>>>> 1. -------
> >>>>>
> >>>>> This report is not identical to my prior report -- "kernel paniced when system hibernates" [1], but both of them
> >>>>> are closely related with the commit (3335068f8721 "riscv: Use PUD/P4D/PGD pages for the linear mapping").
> >>>>>
> >>>>> With this commit, hibernation is trapped with "access fault" while accessing the PMP-protected regions (mmode_resv0@80000000)
> >>>>> from OpenSbi (BTW, hibernation is marked as nonportable by Conor[2]).
> >>>>>
> >>>>> In this report, efi_init handoffs the memory mapping from Boot Services to memblock where reserves mmode_resv0@80000000,
> >>>>> so there is no "access fault" but "page fault".
> >>>>>
> >>>>> And reverting commit 3335068f8721 indeed fixed this panic.
> >>>>>
> >>>>> 2. -------
> >>>>>
> >>>>> As the gdb-pt-dump [3] tool shows, the PTE which covered the fault virtual address had the appropriate permission to store.
> >>>>> Is there another way to trigger the "Store/AMO page fault"? Or the creation of linear mapping in commit 3335068f8721 did something wrong?
> >>>>>
> >>>>> ```
> >>>>> (gdb) p/x $satp
> >>>>> $1 = 0xa000000000081708
> >>>>> (gdb) pt -satp 0xa000000000081708
> >>>>> Address : Length Permissions
> >>>>> 0xff1bfffffea39000 : 0x1000 | W:1 X:0 R:1 S:1
> >>>>> 0xff1bfffffebf9000 : 0x1000 | W:1 X:0 R:1 S:1
> >>>>> 0xff1bfffffec00000 : 0x400000 | W:1 X:0 R:1 S:1
> >>>>> 0xff60000000000000 : 0x1c0000 | W:1 X:0 R:1 S:1
> >>>>> 0xff60000000200000 : 0xa00000 | W:0 X:0 R:1 S:1
> >>>>> 0xff60000000c00000 : 0x7f000000 | W:1 X:0 R:1 S:1 // badaddr: ff6000007fdb1000
> >>>>> 0xff6000007fdc0000 : 0x3d000 | W:1 X:0 R:1 S:1
> >>>>> 0xff6000007ffbf000 : 0x1000 | W:1 X:0 R:1 S:1
> >>>>> 0xffffffff80000000 : 0xc00000 | W:0 X:1 R:1 S:1
> >>>>> 0xffffffff80c00000 : 0xa00000 | W:1 X:0 R:1 S:1
> >>>>>
> >>>>> ```
> >>>>>
> >>>>> 3. ------
> >>>>>
> >>>>> You can also reproduce similar panic by appending "memtest" in kernel cmdline.
> >>>>> I have posted the memtest boot log at this link: https://termbin.com/1twl.
> >>>>>
> >>>>> Please correct me if I'm wrong.
> >>>>>
> >>>>> [1]: https://lore.kernel.org/linux-riscv/CAAYs2=gQvkhTeioMmqRDVGjdtNF_vhB+vm_1dHJxPNi75YDQ_Q@xxxxxxxxxxxxxx/
> >>>>> [2]: https://lore.kernel.org/linux-riscv/20230526-astride-detonator-9ae120051159@wendy/
> >>>>> [3]: https://github.com/martinradev/gdb-pt-dump
> >>>> Thanks for the thorough report, really appreciated.
> >>>>
> >>>> So there are multiple issues here:
> >>>>
> >>>> - the first one is that the memory region for opensbi is marked as not
> >>>> cacheable in the efi memory map, and then this region is not mapped in
> >>>> the linear mapping:
> >>>> [ 0.000000] efi: 0x000080000000-0x00008003ffff [Reserved | |
> >>>> | | | | | | | | | | | |UC]
> >>>>
> >>>> - the second one (that I feel a bit ashamed of...) is that I did not
> >>>> check the alignment of the virtual address when choosing the map size
> >>>> in best_map_size() and then we end up trying to map a physical region
> >>>> aligned on 2MB that is actually not aligned on 2MB virtually because
> >>>> the opensbi region is not mapped at all.
> >>>>
> >>>> - the possible third one is that we should not map the linear mapping
> >>>> using 4K pages, this would be slow in my opinion, and I think we
> >>>> should waste a bit of memory to align va and pa on a 2MB boundary.
> >>>>
> >>>> So I'll fix the second issue, and possibly the third one, and if no
> >>>> one looks into why the opensbi region is mapped in UC, I'll take a
> >>>> look at edk2.
> >>>>
> >>> Hi Alex,
> >>>
> >>> EDK2 marks opensbi range as reserved memory in EFI map. According to DT
> >>> spec, if the no-map is not set, we need to mark it as
> >>> EfiBootServicesData but EfiBootServicesData is actually considered as
> >>> free memory in kernel, as per UEFI spec. To avoid kernel using this
> >>> memory, we deviated from the DT spec for opensbi ranges.
> >> Violating specs is never the answer. Do one of:
> >>
> >> 1. Use no-map and take the performance hit
> >> 2. Exclude the memory range from /memory itself
> >> 3. Come up with a new no-access property that’s a weaker no-map
> >> (i.e. that allows mapping and speculative access) and uses
> >> EfiRuntimeServicesData in EFI land
> >>
> >> 2 feels most normal to me, personally, but all are fine.
> >>
> > Hi Jess,
> >
> > IMO, all the physical memory installed by the user should be visible.
> > Some part of the memory may be reserved and not available for the user
> > but excluding from /memory can cause issues.
> >
> > Whether we mark as EfiReservedMemory or EfiRuntimeServiceData, I think
> > it will be marked as no-map in memblock and can not be used by the OS
> > linear mapping. Alex can confirm this.
>
>
> Yes, I think you're right, EfiRuntimeServiceData will be marked as
> no-map anyway (see is_usable_memory()).
>
>
> >
> > So, my preference is option 1.
>
>
> Yes, again, I think you're right, this is feeling more and more like the
> most "natural" solution to me too, we are struggling for a performance
> benefit that was never proven...
>

I am inclined towards this option as well. After going through the
rationale for marking
any /reserved-memory node without "no-map" as EfiBootServicesData,
this will trip
kernel in future if not happening already. Any region marked as
EfiBootServicesData will
be available to the kernel for use after ExitBootServices.

Let's have a no-map set for the reserved memory set for the firmware.
The fallout would be
anybody with kernel > 6.4 has to upgrade the firmware version that
sets the no-map correctly if they care
about hibernation or EFI booting. OpenSBI v1.3 is planned this month anyway.
We can communicate the same to the rust-sbi project as well.

Any thoughts ?

>
> >
> > Thanks,
> > Sunil
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@xxxxxxxxxxxxxxxxxxx
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/linux-riscv



--
Regards,
Atish