Re: [PATCH] riscv: Define TASK_SIZE_MAX for __access_ok()

From: Samuel Holland
Date: Mon Mar 25 2024 - 15:20:57 EST


On 2024-03-25 1:30 PM, Mark Rutland wrote:
> On Mon, Mar 25, 2024 at 07:02:13PM +0100, Arnd Bergmann wrote:
>> On Mon, Mar 25, 2024, at 17:39, Mark Rutland wrote:
>>
>>> Using a compile-time constant TASK_SIZE_MAX allows the compiler to generate
>>> much better code for access_ok(), and on arm64 we use a compile-time constant
>>> even when our page table depth can change at runtime (and when native/compat
>>> task sizes differ). The only abosolute boundary that needs to be maintained is
>>> that access_ok() fails for kernel addresses.
>>
>> As I understand, this works on arm64 and x86 because the kernel
>> mapping starts on negative 64-bit addresses, so the highest user
>> address (TASK_SIZE = 0x000fffffffffffff) is still smaller than the
>> lowest kernel address (PAGE_OFFSET = 0xfff0000000000000).
>
> Yep; the highest posible user address is always below the lowest possible
> kernel address, and any "non-canonical" address between the two ranges faults.
> There's some fun with TBI (Top Byte Ignore) and MTE, but that only affects how
> to mangle the pointer before the check, and doesn't affect the definition of
> the VA boundary.
>
> In general, so long as TASK_SIZE_MAX is <= the lowest possible kernel address
> and TASK_SIZE_MAX > the highest possible user address, it all works out.
>
>> If an architecture ignores all the top bits of a virtual address,
>> the largest TASK_SIZE would be higher than the smallest (positive,
>> unsigned) PAGE_OFFSET, so you need TASK_SIZE_MAX to be dynamic.
>
> Agreed, but do we even support such architectures within Linux?
>
>> It doesn't look like this is the case on riscv, but I'm not sure
>> about this part.
>
> It looks like riscv is in the same bucket as arm64 and x86 per:
>
> https://www.kernel.org/doc/html/next/riscv/vm-layout.html
>
> ... which says:
>
> | The RISC-V privileged architecture document states that the 64bit addresses
> | "must have bits 63-48 all equal to bit 47, or else a page-fault exception
> | will occur.": that splits the virtual address space into 2 halves separated
> | by a very big hole, the lower half is where the userspace resides, the upper
> | half is where the RISC-V Linux Kernel resides.

Right, and while RISC-V has a pointer masking extension[1] similar to arm64's
TBI, it will be handled[2] the same way: by sign extending the address prior to
checking against TASK_SIZE_MAX. So we maintain the property that userspace
addresses are always "positive" and kernel addresses are always "negative".

Regards,
Samuel

[1]: https://github.com/riscv/riscv-j-extension/raw/a1e68469c60/zjpm-spec.pdf
[2]:
https://lore.kernel.org/linux-riscv/20240319215915.832127-1-samuel.holland@xxxxxxxxxx/