Re: [PATCH] riscv: head: use 0 as the default text_offset

From: Atish Kumar Patra
Date: Tue Nov 29 2022 - 04:16:44 EST


On Mon, Nov 28, 2022 at 10:55 PM Samuel Holland <samuel@xxxxxxxxxxxx> wrote:
>
> On 11/29/22 00:19, Palmer Dabbelt wrote:
> > On Mon, 28 Nov 2022 21:04:48 PST (-0800), samuel@xxxxxxxxxxxx wrote:
> >> On 11/28/22 14:11, Atish Kumar Patra wrote:
> >>> On Mon, Nov 28, 2022 at 7:34 AM Jisheng Zhang <jszhang@xxxxxxxxxx>
> >>> wrote:
> >>>>
> >>>> Commit 0f327f2aaad6 ("RISC-V: Add an Image header that boot loader can
> >>>> parse.") adds an image header which "is based on ARM64 boot image
> >>>> header and provides an opportunity to combine both ARM64 & RISC-V
> >>>> image headers in future.". At that time, arm64's default text_offset
> >>>> is 0x80000, this is to give "512 KB of guaranteed BSS space to put
> >>>> the swapper page tables" as commit cfa7ede20f13 ("arm64: set
> >>>> TEXT_OFFSET
> >>>> to 0x0 in preparation for removing it entirely") pointed out, but
> >>>> riscv doesn't need the space, so use 0 as the default text_offset.
> >>>>
> >>>> Before this patch, booting linux kernel on Sipeed bl808 M1s Dock
> >>>> with u-boot booti cmd:
> >>>> [ 0.000000] OF: fdt: Ignoring memory range 0x50000000 - 0x50200000
> >>>> ...
> >>>> [ 0.000000] DMA32 [mem 0x0000000050200000-0x0000000053ffffff]
> >>>> As can be seen, 2MB DDR(0x50000000 - 0x501fffff) can't be used by
> >>>> linux.
> >>>>
> >>>> After this patch, the 64MB DDR is fully usable by linux
> >>>> [ 0.000000] DMA32 [mem 0x0000000050000000-0x0000000053ffffff]
> >>>>
> >>>> Signed-off-by: Jisheng Zhang <jszhang@xxxxxxxxxx>
> >>>> ---
> >>>> arch/riscv/kernel/head.S | 12 +-----------
> >>>> 1 file changed, 1 insertion(+), 11 deletions(-)
> >>>>
> >>>> diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> >>>> index b865046e4dbb..ef95943f7a70 100644
> >>>> --- a/arch/riscv/kernel/head.S
> >>>> +++ b/arch/riscv/kernel/head.S
> >>>> @@ -38,18 +38,8 @@ ENTRY(_start)
> >>>> .word 0
> >>>> #endif
> >>>> .balign 8
> >>>> -#ifdef CONFIG_RISCV_M_MODE
> >>>> - /* Image load offset (0MB) from start of RAM for M-mode */
> >>>> + /* Image load offset (0MB) from start of RAM */
> >>>> .dword 0
> >>>> -#else
> >>>> -#if __riscv_xlen == 64
> >>>> - /* Image load offset(2MB) from start of RAM */
> >>>> - .dword 0x200000
> >>>> -#else
> >>>> - /* Image load offset(4MB) from start of RAM */
> >>>> - .dword 0x400000
> >>>> -#endif
> >>>> -#endif
> >>>
> >>> NACK.
> >>> RV64 needs to boot at a 2MB aligned address and RV32 needs to boot at
> >>> a 4MB aligned address.
> >>> The firmware is assumed to live at the start of DRAM for Linux running
> >>> in S-mode.
> >>
> >> What needs to happen so we can stop making this assumption? If the SBI
> >> implementation wants to reserve memory, it should use the devicetree to
> >> do so. OpenSBI already does this.
> >
> > IMO we've really screwed up the boot flow on RISC-V. Having Linux
> > reserve space for the firmware is just all backwards, Linux can't know
> > how much memory the firmware needs (which manifests under large hart
> > counts in OpenSBI, for example). Unfortunately there's no specification
> > that defines these platform-level details, so we're stuck depending on
> > unspecified behavior like this.
> >
> > I think we could fix this by either making Linux's early boot relocation
> > code work sanely (fix whatever bugs are there, document what can't be
> > fixed, and then add some sort of Image flag to tell firmware the kernel
> > can be relocated) or relying on relocatable firmware, but both of those
> > come with some costs ...
>
> It sounds like Alexandre's patch[1] lets us use memory below this
> offset, so we don't have to relocate the kernel to the beginning of RAM.
> In fact, we could even increase the offset if we are concerned about the
> kernel link address conflicting with the SBI implementation.
>
> [1]:
> https://patchwork.kernel.org/project/linux-riscv/patch/20221122084141.1849421-1-alexghiti@xxxxxxxxxxxx/
>
> >> Throwing away 2 MiB of RAM is quite wasteful considering we have
> >> multiple SoCs (D1s, BL808) that are limited to 64 MiB of in-package RAM.
> >
> > ... and I'd argue that users on systems don't want to pay those costs.
>
> What does fixing the early relocation code cost? Just longer boot time?
> If the bootloader takes care of avoiding reserved-memory regions, and
> Linux can run from wherever it gets loaded, that would be ideal to me.
>
> > In fact, I'd argue that systems like that don't want resident firmware
> > at all.
>
> I would much rather pay 256 KiB for resident firmware than reimplement
> all of the power management and PMU logic in Linux. It's not as bad as
> losing 2 MiB when I know most of that is unused.
>

There are also debug triggers, AP-TEE which are SBI extensions.
In addition to that we have steal time accounting (STA) SBI extension
in virtualization
use cases as well.

Note: The PMU requirement will probably no longer be if supervisor
counter delegation
extension is approved. But it will take some time for hardware to
actually implement that.

> > So let's just add a CONFIG_SBI=n, and then just use direct drivers for
> > everything. If the firmware doesn't need to be resident then it's
> > pretty straight-forward to support these 0 offsets, so we can just add
> > that as another Kconfig. Sure this will trip up firmware that depends
> > on these fixed reservations, but saying "the resident firmware fits in 0
> > superpages" is just as much of a platform-specific dependency as saying
> > "the resident firmware fits in 1 superpage". If firmware can't handle
> > this field in the Image format then we're going to end up with breakages
> > at some point, it might as well be now.
> >
> > If these systems don't have all the ISA bits necessary to avoid M-mode
> > entirely then we can just implement a tiny M-mode stub in Linux that
> > gets left around during early boot and then shims stuff to S-mode.
> > That'll be a bit of a headache and with some extensions it can be
> > avoided, the standard stuff won't allow for that until the latest round
> > of specs is done but if it's possible via whatever custom extensions are
> > in these things then that's probably the way to go.
>
> I don't think Linux has a choice here, when started in S-mode. And
> neither does the bootloader parsing the Image, because it most likely
> runs in S-mode as well.
>
> And when started in M-mode, we already don't use SBI.
>
> Regards,
> Samuel
>