Re: [GIT PULL] x86/mm for 6.4

From: Linus Torvalds
Date: Thu May 04 2023 - 13:10:48 EST


On Wed, May 3, 2023 at 11:28 PM Kirill A. Shutemov <kirill@xxxxxxxxxxxxx> wrote:
>
> On Wed, May 03, 2023 at 09:38:03AM -0700, Linus Torvalds wrote:
> > > Why does it do that "shift-by-63" game there, instead of making
> > > tlbstate_untag_mask just have bit #63 always set?
> >
> > And it turns out that bit #63 really _is_ always set, so I think the
> > solution to this all is to remove the sign games in untag_addr()
> > entirely.
>
> Untagging kernel pointer with LAM enabled will land it in the canonical
> hole which is safe as it leads to #GP on dereference.

You are entirely missing the point.

The GP fault does *NOT MATTER*.

Think of 'untagged_addr()' as a companion to - but absolutely *NOT* a
replacement for - 'access_ok()'.

You have three distinct cases:

(a) user-supplied valid user address

(b) user-supplied invalid user address (it high bit set)

(c) actual kernel address

and 'untagged_addr()' and 'access_ok()' work on the same basic input
domain: cases (a) and (b).

And the important thing for 'untagged_addr()' is that in case (a) it
needs to remove the tab bits, and in case (b) needs to *keep* the
address as an invalid user address.

Note that it does not need to keep the value the *same*. Nobody cares.

And also note that the resulting pointer may or may not GP-fault. Nobody cares.

Just to hit this home with a very explicit example, think of the case
where the kernel config for address masking isn't even enabled, and
'untagged_addr()' is a 1:1 map with no changes.

Doing 'untagged_addr()' on a valid user address results in a valid
user address. And doing it on an invalid user address results in an
invalid user address. GOOD.

Note that doing 'untagged_addr()' on an invalid user access does NOT
IN ANY WAY make that invalid address somehow "safe" and cause a GP
fault. Sure, it _might_ GP-fault. Or it might not. It might point to
random kernel data.

Verification of the address is simply not the job of
'untagged_addr()'. Never has been, never will be, and fundamentally
*cannot* be, since for the forseeable future 'untagged_addr()' is a
no-op for every single user outside of Intel.

Verification is separate. The verification is never "let's randomly
just access this pointer and see if it gets a GP-fault". No. We have a
user pointer, it needs *checking*. It needs to have something like
"use lookup_vma() to look up the vma that is associated with that
address". But it could also be something like "just do the range check
in GUP".

And that's why "keep an invalid user address as an invalid address",
because that *separate* stage of verifying the address needs to still
show that it's invalid.

Now, sometimes the "verification" might actually be "access_ok()"
itself, but honestly, if the address is used for an actual access,
then it shouldn't have gone through the 'untagged_addr()' thing at
all. It should just have been used as-is for the access. So normally
'untagged_addr()' does not get used *together* with 'access_ok()',
although that should obviously also work.

End result: all that really matters on x86-64 is that
'untagged_addr()' must keep the high bit as-is. That's the "this is
not a valid user address" bit. That's very much explicit in my current
series, of course, but even without my current series it was
implicitly the truth with those LAM patches (particularly considering
that 'untagged_addr()' didn't actually do the "keep kernel addresses
as-is" that it *thought* it did due to the signed type confusion).

So what about that (c) case? It doesn't matter. It's simply
fundamentally wrong for the kernel to pass an actual kernel address to
'untagged_addr()' and expect something useful back. It's a nonsensical
thing to do, and it's a huge bug.

So for the (c) case, the fact that the result would be useless and
*usually* GP-fault on access is a good thing. But it's not a
requirement, it's more of a "oh, cool, it's good that the nonsensical
operation causes quick failures".

So in that case, the GP fault is a "feature", but not a requirement.
Again, the normal 'untagged_addr()' case (of not changing the pointer
at all), obviously does *not* cause the kernel pointer corruption, but
maybe we could even have a debug mode. We could literally make
'untagged_addr()' do something like

#ifdef CONFIG_DEBUG_NONLAM
// Make sure nobody tries to use untagged_addr() on non-user addresses
#define untagged_addr(x) ((x) | (long)(x)>>63)
#endif

except obviously with the "get the types right and use 'x' only once"
thing (so the above #define is buggy, and puresly for conceptual
documentation purposes).

See?

Side note: one day, maybe we want to use address tagging *inside* the
kernel. However, that will not use 'untagged_addr()'. That would use
some *other* model for cleaning up kernel pointers when necessary.

Even on x86-64, the tagging rules for kernel and user space is
entirely different, in that user-space LAM rules are "U48" or "U57",
while kernel LAM rules depend on the paging mode, and the two are
*not* tied together.

But more importantly from a kernel perspective: regardless of hardware
implementations like that, the notion of masking bits off a untrusted
user pointer is simply completely *different* from the notion of
masking off bits of our own kernel pointers. You can see this in the
kernel everywhere: user pointers should be statically always user
pointers, and should look something like

struct iocb __user *user_iocb

which is very very different from a kernel pointer that doesn't have
that "__user" thing. Again, on some architectures, the *EXACT*SAME*
bit pattern pointer value may point to different things, because user
accesses are simply in a different address space entirely.

So if/when KASAN starts using LAM inside the kernel, it will do its
own things. It had better not use "untagged_addr()".

Linus