Re: [PATCH] riscv: Define TASK_SIZE_MAX for __access_ok()

From: Alexandre Ghiti
Date: Mon Mar 25 2024 - 09:14:33 EST


Hi Arnd,

On 24/03/2024 23:05, Arnd Bergmann wrote:
On Tue, Mar 19, 2024, at 17:51, Alexandre Ghiti wrote:
On 18/03/2024 22:29, Samuel Holland wrote:
On 2024-03-18 3:50 PM, Alexandre Ghiti wrote:
On Wed, Mar 13, 2024 at 7:00 PM Samuel Holland
It looks like the call to fixup_exception() [added
in 416721ff05fd ("riscv, mm: Perform BPF exhandler fixup on page fault")] is
only intended to catch null pointer dereferences. So making the change wouldn't
have any functional impact, but it would still be a valid optimization.

Or I was wondering if it would not be better to do like x86 and use an
alternative, it would be more correct (even though I believe your
solution works)
https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/page_64.h#L82.
What would be the benefit of using an alternative? Any access to an address
between TASK_SIZE and TASK_SIZE_MAX is guaranteed to generate a page fault, so
the only benefit I see is returning -EFAULT slightly faster at the cost of
applying a few hundred alternatives at boot. But it's possible I'm missing
something.

The use of alternatives allows to return right away if the buffer is
beyond the usable user address space, and it's not just "slightly
faster" for some cases (a very large buffer with only a few bytes being
beyond the limit or someone could fault-in all the user pages and fail
very late...etc). access_ok() is here to guarantee that such situations
don't happen, so actually it makes more sense to use an alternative to
avoid that.
The access_ok() function really wants a compile-time constant
value for TASK_SIZE_MAX so it can do constant folding for
repeated calls inside of one function, so for configurations
with a boot-time selected TASK_SIZE_64 it's already not ideal,
with or without alternatives.

If I read the current code correctly, riscv doesn't even
have a way to build with a compile-time selected
VA_BITS/PGDIR_SIZE, which is probably a better place to
start optimizing, since this rarely needs to be selected
dynamically.


Indeed, we do not support compile-time fixed VA_BITS! We could, but that would only be used for custom kernels. I don't think distro kernels will ever (?) propose 3 different kernels for sv39, sv48 and sv57 because the cost of dynamically choosing the address space width is not big enough to me (and the burden of maintaining 3 different kernels is).

Let me know if I'm wrong, I'd be happy to work on that.

Thanks,

Alex



Arnd