Re: [RFC PATCH v1 0/8] Introduce mseal() syscall

From: Theo de Raadt
Date: Fri Oct 20 2023 - 12:27:36 EST


Stephen Röttger <sroettger@xxxxxxxxxx> wrote:

> > I do like us starting with just "mimmutable()", since it already
> > exists. Particularly if chrome already knows how to use it.
> >
> > Maybe add a flag field (require it to be zero initially) just to allow
> > any future expansion. Maybe the chrome team has *wanted* to have some
> > finer granularity thing and currently doesn't use mimmutable() in some
> > case?
>
> Yes, we do have a use case in Chrome to split the sealing into unmap and
> mprotect which will allow us to seal additional pages that we can't seal with
> pure mimmutable().

> For example, we have pkey-tagged RWX memory that we want to seal. Since
> the memory is already RWX and the pkey controls write access, we don't care
> about permission changes but sometimes we do need to mprotect data only
> pages.

Let me try to decompose this statement.

This is clearly for the JIT. You can pivot between the a JIT generated
code mapping being RW and RX (or X-only), the object will pivot between
W or X to satisfy W^X policy for safety.

I think you are talking about a RWX MAP_ANON object.

Then you use pkey_alloc() to get a PKEY. pkey_mprotect() sets the PKEY on
the region. I argue you can then make it entirely immutable / sealed.

Let's say it is fully immutable / sealed.

After which, you can change the in-processor PKU register (using pkey_set)
to toggle the Write-Inhibit and eXecute-Inhibit bits on that key.

The immutable object has a dangerous RWX permission. But the specific
PKEY making it either RX (or X-only) or RW depending upon your context.
The mapping is never exposed as RWX. The PKU model reduces the
permission access of the object below the immutable permission level.

The security depends on the PKEY WI/XI bits being difficult to control.

SADLY on x86, this is managed with a PKRU userland register which is changeble
without any supervisor control -- yes, it seems quite dangerous.
Changing it requires a complicated MSR dance. It is unfortunate that
the pkey_set() library function is easily reachedable in the PLT via ROP
methods. On non-x86 cpus that have similar functionality, the register
is privileged, but operating supporting it generally change it and
return immediately.

The problem you seem to have with fully locked mseal() in chrome seems
to be here:

> about permission changes but sometimes we do need to mprotect data only
> pages.

Does that data have to be in the same region? Can your allocator not
put the non-code pieces of the JIT elsewhere, with a different
permission, fully immutable / msealed -- and perhaps even managed with a
different PKEY if neccessary?

May that requires a huge rearchitecture. But isn't the root problem here
that data and code are being handled in the same object with a shared
permission model?

> But the munmap sealing will provide protection against implicit changes of the
> pkey in this case which would happen if a page gets unmapped and another
> mapped in its place.

That primitive feels so weird, I have a difficult time believing it will
remain unattackable in the long term.


But what if you could replace mprotect() with pkey_mprotect() upon a
different key.. ?

--

A few more notes comparing what OpenBSD has done compared to Linux:


In OpenBSD, we do not have the pkey library. We have stolen one of the
PKEY and use it for kernel support of xonly for kernel code and userland
code. On x86 we recognize that userland can flip the permission by whacking
the RPKU register -- which would make xonly code readable. (The chrome data
you are trying to guard faces the same problem).

To prevent that, a majority of traps in the kernel (page faults,
interrupts, etc) check if the PKRU register has been modified, and kill
the process. It is statistically strong.

We are not making pkey available as a userland feature, but if we later
do so we would still have 15 pkeys to play with. We would probably make
the pkey_set() operation a system call, so the trap handler can also
observe RPKU register modifications by the instruction.

Above, I mentioned pivoting between "RW or RX (or X-only)". On OpenBSD, chrome would be
able to pivot between RW and X-only.

When it comes to Pkey utilization, we've ended up in a very different
place than Linux.