Re: [PATCH v7 0/4] Introduce mseal()

From: Jeff Xu
Date: Wed Jan 24 2024 - 13:56:33 EST


On Tue, Jan 23, 2024 at 10:58 AM Theo de Raadt <deraadt@xxxxxxxxxxx> wrote:
>
> It's the same with MAP_MSEALABLE. I don't get it. So now there are 3
> memory types:
> - cannot be sealed, ever
> - not yet sealed
> - sealed
>
> What purpose does the first type serve? Please explain the use case.
>
> Today, processes have control over their entire address space.
>
> What is the purpose of "permissions cannot be locked". Please supply
> an example. If I am wrong, I'd like to know where I went wrong.
>
The linux example is in the V3 and V4 cover letter [1] [2] of the open
discussion section.

[1] https://lore.kernel.org/linux-mm/20231212231706.2680890-1-jeffxu@xxxxxxxxxxxx/T/
[2] https://lore.kernel.org/linux-mm/20240104185138.169307-3-jeffxu@xxxxxxxxxxxx/T/

Copied below for ease of reading.
-----------------------------------------------------------------------------------------
During the development of V3, I had new questions and thoughts and
wished to discuss.

1> shm/aio
>From reading the code, it seems to me that aio/shm can mmap/munmap
maps on behalf of userspace, e.g. ksys_shmdt() in shm.c. The lifetime
of those mapping are not tied to the lifetime of the process. If those
memories are sealed from userspace, then unmap will fail. This isn’t a
huge problem, since the memory will eventually be freed at exit or
exec. However, it feels like the solution is not complete, because of
the leaks in VMA address space during the lifetime of the process.

2> Brk (heap/stack)
Currently, userspace applications can seal parts of the heap by
calling malloc() and mseal(). This raises the question of what the
expected behavior is when sealing the heap is attempted.

let's assume following calls from user space:

ptr = malloc(size);
mprotect(ptr, size, RO);
mseal(ptr, size, SEAL_PROT_PKEY);
free(ptr);

Technically, before mseal() is added, the user can change the
protection of the heap by calling mprotect(RO). As long as the user
changes the protection back to RW before free(), the memory can be
reused.

Adding mseal() into picture, however, the heap is then sealed
partially, user can still free it, but the memory remains to be RO,
and the result of brk-shrink is nondeterministic, depending on if
munmap() will try to free the sealed memory.(brk uses munmap to shrink
the heap).

3> Above two cases led to the third topic:
There one option to address the problem mentioned above.
Option 1: A “MAP_SEALABLE” flag in mmap().
If a map is created without this flag, the mseal() operation will
fail. Applications that are not concerned with sealing will expect
their behavior to be unchanged. For those that are concerned, adding a
flag at mmap time to opt in is not difficult. For the short term, this
solves problems 1 and 2 above. The memory in shm/aio/brk will not have
the MAP_SEALABLE flag at mmap(), and the same is true for the heap.

If we choose not to go with path, all mapping will by default
sealable. We could document above mentioned limitations so devs are
more careful at the time to choose what memory to seal. I think
deny of service through mseal() by attacker is probably not a concern,
if attackers have access to mseal() and unsealed memory, then they can
also do other harmful thing to the memory, such as munmap, etc.

4>
I think it might be possible to seal the stack or other special
mappings created at runtime (vdso, vsyscall, vvar). This means we can
enforce and seal W^X for certain types of application. For instance,
the stack is typically used in read-write mode, but in some cases, it
can become executable. To defend against unintented addition of
executable bit to stack, we could let the application to seal it.

Sealing the heap (for adding X) requires special handling, since the
heap can shrink, and shrink is implemented through munmap().

Indeed, it might be possible that all virtual memory accessible to user
space, regardless of its usage pattern, could be sealed. However, this
would require additional research and development work.

-----------------------------------------------------------------------------------------------------