Re: [RFC PATCH v2 5/8] mseal: Check seal flag for munmap(2)

From: Jeff Xu
Date: Wed Oct 18 2023 - 13:14:43 EST


On Wed, Oct 18, 2023 at 8:08 AM Jeff Xu <jeffxu@xxxxxxxxxx> wrote:
>
> On Tue, Oct 17, 2023 at 9:54 AM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Tue, 17 Oct 2023 at 02:08, <jeffxu@xxxxxxxxxxxx> wrote:
> > >
> > > Of all the call paths that call into do_vmi_munmap(),
> > > this is the only place where checkSeals = MM_SEAL_MUNMAP.
> > > The rest has checkSeals = 0.
> >
> > Why?
> >
> > None of this makes sense.
> >
> > So you say "we can't munmap in this *one* place, but all others ignore
> > the sealing".
> >
> I apologize that previously, I described what this code does, and not reasoning.
>
> In our threat model, as Stephen Röttger point out in [1], and I quote:
>
> V8 exploits typically follow a similar pattern: an initial bug leads
> to memory corruption but often the initial corruption is limited and
> the attacker has to find a way to arbitrarily read/write in the whole
> address space.
>
> The memory correction is in the user space process, e.g. Chrome.
> Attackers will try to modify permission of the memory, by calling
> mprotect, or munmap then mmap to the same address but with different
> permission, etc.
>
> Sealing blocks mprotect/munmap/mremap/mmap call from the user space
> process, e.g. Chrome.
>
> At time of handling those 4 syscalls, we need to check the seal (
> can_modify_mm), this requires locking the VMA (
> mmap_write_lock_killable), and ideally, after validating the syscall
> input. The reasonable place for can_modify_mm() is from utility
> functions, such as do_mmap(), do_vmi_munmap(), etc.
>
> However, there is no guarantee that do_mmap() and do_vmi_munmap() are
> only reachable from mprotect/munmap/mremap/mmap syscall entry point
> (SYSCALL_DEFINE_XX). In theory, the kernel can call those in other
> scenarios, and some of them can be perfectly legit. Those other
> scenarios are not covered by our threat model at this time. Therefore,
> we need a flag, passed from the SYSCALL_DEFINE_XX entry , down to
> can_modify_mm(), to differentiate those other scenarios.
>
> Now, back to code, it did some optimization, i.e. doesn't pass the
> flag from SYSCALL_DEFINE_XX in all cases. If SYSCALL_DEFINE_XX calls
> do_a, and do_a has only one caller, I will set the flag in do_a,
> instead of SYSCALL_DEFINE_XX. Doing this reduces the size of the
> patchset, but it also makes the code less readable indeed. I could
> remove this optimization in V3. I welcome suggestions to improve
> readability on this.
>
> When handing the mmap/munmap/mremap/mmap, once the code passed
> can_modify_mm(), it means the memory area is not sealed, if the code
> continues to call the other utility functions, we don't need to check
> the seal again. This is the case for mremap(), the seal of src address
> and dest address (when applicable) are checked first, later when the
> code calls do_vmi_munmap(), it no longer needs to check the seal
> again.
>
> [1] https://v8.dev/blog/control-flow-integrity
>
> -Jeff

There is also alternative approach:

For all the places that call do_vmi_munmap(), find out which
case should ignore the sealing flag legitimately, set an ignore_seal
flag and pass it down into do_vmi_munmap(). For the rest case,
use default behavior.

All future API will automatically be covered for sealing, by using default.

The risky side, if I missed a case that requires setting ignore_seal,
there will be a bug.

Also if a driver calls the utility functions to unmap a memory, the
seal will be checked as well. (Driver is not in our threat model,
but Chrome probably doesn't mind it.)

Which of those two approaches are better ? I appreciate the direction on this.

Thanks!
-Jeff


-Jeff