Re: [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl

From: Catalin Marinas
Date: Thu Nov 10 2022 - 07:03:40 EST


On Thu, Nov 10, 2022 at 11:27:14AM +0000, Joey Gouly wrote:
> On Fri, Oct 28, 2022 at 11:51:00AM -0700, Kees Cook wrote:
> > On Wed, Oct 26, 2022 at 04:04:56PM +0100, Joey Gouly wrote:
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index 099468aee4d8..42eaf6683216 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -1409,6 +1409,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> > > vm_flags |= VM_NORESERVE;
> > > }
> > >
> > > + if (map_deny_write_exec(NULL, vm_flags))
> > > + return -EACCES;
> > > +
> >
> > This seems like the wrong place to do the check -- that the vma argument
> > is a hard-coded "NULL" is evidence that something is wrong. Shouldn't
> > it live in mmap_region()? What happens with MAP_FIXED, when there is
> > an underlying vma? i.e. an MAP_FIXED will, I think, bypass the intended
> > check. For example, we had "c" above:
> >
> > c) mmap(PROT_READ);
> > mprotect(PROT_READ|PROT_EXEC); // fails
> >
> > But this would allow another case:
> >
> > e) addr = mmap(..., PROT_READ, ...);
> > mmap(addr, ..., PROT_READ | PROT_EXEC, MAP_FIXED, ...); // passes
>
> I can move the check into mmap_region() but it won't fix the MAP_FIXED
> example that you showed here.
>
> mmap_region() calls do_mas_munmap(..) which will unmap overlapping regions.
> However the `vma` for the 'old' region is not kept around, and a new vma will
> be allocated later on "vma = vm_area_alloc(mm);", and the vm_flags are just set
> to what is passed into mmap_region(), so map_deny_write_exec(vma, vm_flags)
> will just be as good as passing NULL.
>
> It's possible to save the vm_flags from the region that is unmapped, but Catalin
> suggested it might be better if that is part of a later extension, what do you
> think?

I thought initially we should keep the behaviour close to what systemd
achieves via SECCOMP while only relaxing an mprotect(PROT_EXEC) if the
vma is already executable (i.e. check actual permission change not just
the PROT_* flags).

We could pass the old vm_flags for that region (and maybe drop the vma
pointer entirely, just check old and new vm_flags). But this feels like
tightening slightly systemd's MDWE approach. If user-space doesn't get
confused by this, I'm fine to go with it. Otherwise we can add a new
flag later for this behaviour

I guess that's more of a question for Topi on whether point tightening
point (e) is feasible/desirable.

--
Catalin