Re: [RFC PATCH v1 0/8] Introduce mseal() syscall

From: Theo de Raadt
Date: Fri Oct 20 2023 - 11:55:39 EST


Stephen Röttger <sroettger@xxxxxxxxxx> wrote:

> > > IMO: The approaches mimmutable() and mseal() took are different, but
> > > we all want to seal the memory from attackers and make the linux
> > > application safer.
> >
> > I think you are building mseal for chrome, and chrome alone.
> >
> > I do not think this will work out for the rest of the application space
> > because
> >
> > 1) it is too complicated
> > 2) experience with mimmutable() says that applications don't do any of it
> > themselves, it is all in execve(), libc initialization, and ld.so.
> > You don't strike me as an execve, libc, or ld.so developer.
>
> We do want to build this in a way that it can be applied automatically by ld.so
> and we appreciate all your feedback on this.

Hi Stephen,

I am pretty sure your mechanism will be useable by ld.so.

What bothers me is the complex many-bits approach may encourage people
to set only a subset of the bits, and then believe they have a security
primitive.

Partial sealing is not safe. I define partial sealing as "blocking munmap,
but not mprotect". Or "blocking mprotect, but not madvise or mmap".

In Message-id <ZS/3GCKvNn5qzhC4@xxxxxxxxxxxxxxxxxxxx> Matthew stated there
that there are two aspects being locked: which object is mapped, and the
permission of that mapping. When additional system calls msync() and madvise()
are included in the picture, there are 3 actions being prevented:

- Can someone replace the object
- Can someone change the permission
- Can someone throw away the cached pages, reverting to original
content of the object (that is the madvise / msync)

In Message-id: <CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426FkcgnfUGLvA@xxxxxxxxxxxxxx>
Jan reminded us of this piece. I'm taking this as a long-standing security
hole in some sub-operations of msync/madvise which can write to data regions
that aren't actually writeable. Sub-operations with this problem are MADV_FREE,
MADV_DONTNEED, POSIX_MADV_DONTNEED, MS_INVALIDATE.. on Linux MADV_WIPEONFORK,
and probably a whole bunch of others. I am testing OpenBSD changes which
demand PROT_WRITE permission for these sub-operations. Perhaps some systems
are already careful.

If you leave any of these operators available, the object is not actually sealed
against abuse. I believe an attacker will simply switch to a different operator
(mmap, munmap, mprotect, madvise, msync) to achieve a similar objective of
damaging the permission or contents.

Since mseal() is designed to create partial sealings, the name of the proposed
system call really smells.

> The intention of
> splitting the sealing
> by syscall was to provide flexibility while still allowing ld.so to
> seal all operations.

Yes, you will have ld.so set all the bits, and the same in C runtime
initialization. If you convince glibc to stop make the stack executable
in dlopen(), the kernel could automatically do it.. With Linux backwards
compat management, getting there would be an extremely long long long
roadmap. But anyways the idea would be "set all the bits". Because otherwise
the object or data isn't safe.

> Does Linus' proposal to just split munmap / mprotect sealing address your
> complexity concerns? ld.so would always use both flags which should then behave
> similar to mimmutable().

No, I think it is weak, because it isn't sealed.

A seperate mail in the thread from you says this is about chrome wanting
to use PKU on RWX objects. I think that's the reason for wanting to
seperate the sealing (I haven't heard of other applications wanting that).
How about we explore that in the other subthread..