Re: [PATCH] riscv: Define TASK_SIZE_MAX for __access_ok()

From: Mark Rutland
Date: Mon Mar 25 2024 - 14:43:58 EST


On Mon, Mar 25, 2024 at 07:02:13PM +0100, Arnd Bergmann wrote:
> On Mon, Mar 25, 2024, at 17:39, Mark Rutland wrote:
>
> > Using a compile-time constant TASK_SIZE_MAX allows the compiler to generate
> > much better code for access_ok(), and on arm64 we use a compile-time constant
> > even when our page table depth can change at runtime (and when native/compat
> > task sizes differ). The only abosolute boundary that needs to be maintained is
> > that access_ok() fails for kernel addresses.
>
> As I understand, this works on arm64 and x86 because the kernel
> mapping starts on negative 64-bit addresses, so the highest user
> address (TASK_SIZE = 0x000fffffffffffff) is still smaller than the
> lowest kernel address (PAGE_OFFSET = 0xfff0000000000000).

Yep; the highest posible user address is always below the lowest possible
kernel address, and any "non-canonical" address between the two ranges faults.
There's some fun with TBI (Top Byte Ignore) and MTE, but that only affects how
to mangle the pointer before the check, and doesn't affect the definition of
the VA boundary.

In general, so long as TASK_SIZE_MAX is <= the lowest possible kernel address
and TASK_SIZE_MAX > the highest possible user address, it all works out.

> If an architecture ignores all the top bits of a virtual address,
> the largest TASK_SIZE would be higher than the smallest (positive,
> unsigned) PAGE_OFFSET, so you need TASK_SIZE_MAX to be dynamic.

Agreed, but do we even support such architectures within Linux?

> It doesn't look like this is the case on riscv, but I'm not sure
> about this part.

It looks like riscv is in the same bucket as arm64 and x86 per:

https://www.kernel.org/doc/html/next/riscv/vm-layout.html

.. which says:

| The RISC-V privileged architecture document states that the 64bit addresses
| "must have bits 63-48 all equal to bit 47, or else a page-fault exception
| will occur.": that splits the virtual address space into 2 halves separated
| by a very big hole, the lower half is where the userspace resides, the upper
| half is where the RISC-V Linux Kernel resides.

Mark.