Re: PKU usage improvements for threads

From: Andy Lutomirski
Date: Fri Sep 02 2022 - 13:19:08 EST




On Thu, Aug 25, 2022, at 7:36 AM, Dave Hansen wrote:
> On 8/25/22 05:30, Stephen Röttger wrote:
>>>> We were also thinking about if this should be a more generic feature instead of
>>>> being tied to pkeys. I.e. the doc above has an alternative proposal to introduce
>>>> something like a memory seal/unseal syscall.
>>>> I was personally leaning towards using pkeys for this for a few reasons:
>>>> * intuitively it would make sense to me to extend PKEY_DISABLE_ACCESS
>>>> to also mean disable all changes to the memory, not just the data.
>>> It would make some sense, but we can't do it with the existing
>>> PKEY_DISABLE_ACCESS ABI. It would surely break existing users if they
>>> couldn't munmap() memory that was PKEY_DISABLE_ACCESS.
>> Our thought was that this could be opt-in with a prctl().

I know Linux never copies other OSes, but OpenBSD is considering this:

https://undeadly.org/cgi?action=article;sid=20220902100648

If it works well, we could implement it.

>
> So, today, you have this:
>
> foo = malloc(PAGE_SIZE);
> pkey_mprotect(foo, PAGE_SIZE, READ|WRITE, pkey=1);
> munmap(foo); // <-- works fine
> mmap(hint=foo, ...); // now attacker controls &foo
>
> Which is problematic. What you want instead is something like this:
>
> prctl(PR_ARCH_NO_MUNMAP_ON_PKEY); // or whatever
> foo = malloc(PAGE_SIZE);
> pkey_mprotect(foo, PAGE_SIZE, READ|WRITE, pkey=1);
> wrpkru(PKEY_DISABLE_ACCESS<<pkey*2);
> munmap(foo); // returns -EPERM (or whatever)
>
> Which requires the kernel to check when it's modifying a VMA (like the
> munmap() above) to see if PKRU _currently_ permits access to the VMA's
> contents. If not, the kernel should refuse to modify the VMA.
>
> Like I said, I don't think this is _insane_, but I can see it breaking
> perfectly innocent things. For instance, an app that today does a
> free() if pkey-assigned memory might work perfectly fine for a long time
> since that memory is rarely unmapped. But, the minute that malloc()
> decides it needs to zap the memory, *malloc()* will fail.
>
> I also wonder how far these semantics would go. Would madvise() work on
> these access-denied VMAs?
>
> My gut says that we don't want to mix up pkey semantics with this new
> mechanism.