RE: [PATCH] riscv: Define TASK_SIZE_MAX for __access_ok()

From: David Laight
Date: Tue Mar 26 2024 - 06:20:23 EST


From: Arnd Bergmann
> Sent: 25 March 2024 20:38
>
> On Mon, Mar 25, 2024, at 19:30, Mark Rutland wrote:
> > On Mon, Mar 25, 2024 at 07:02:13PM +0100, Arnd Bergmann wrote:
> >> On Mon, Mar 25, 2024, at 17:39, Mark Rutland wrote:
> >
> >> If an architecture ignores all the top bits of a virtual address,
> >> the largest TASK_SIZE would be higher than the smallest (positive,
> >> unsigned) PAGE_OFFSET, so you need TASK_SIZE_MAX to be dynamic.
> >
> > Agreed, but do we even support such architectures within Linux?
>
> Apparently not.
>
> On 32-bit architectures, you often have TASK_SIZE==PAGE_OFFSET,
> but not on 64-bit -- either the top few bits in PAGE_OFFSET are
> always ones, or the user and kernel page tables are completely
> separate.

ISTR that arm64 uses (something like) bit 56 to select kernel
with the annoying 'feature' that the high bits can be ignored
just to complicate things.

But I also recall the people that want 'address masking' for x86-64
have been persuaded that addresses with the top bit set are invalid.
I has to be said that I'm not sure that aliasing user addresses
like that is a good idea.
If the TLB/PTE verified the masked bits it might be more use.

If bit63 selects kernel addresses (as in x86-64) there is a massive
(non-canonical address) gap before the first valid kernel address
that is larger than the user address space (and hence buffer size).
I think that lets access_ok() check ((address | size) >> 60) != 0.
Although it probably means that you don't need to test 'size' at all
(unless some code probes the last byte of the buffer).

For 32bit the user/kernel boundary is usually 0x80000000 or 0xc0000000
and there may not even be an invalid page between the two.
This does require access_ok() check the length (even for get_user()).

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)