Re: [PATCH v2 0/5] mm/memfd: MFD_NOEXEC for memfd_create

From: Jeff Xu
Date: Wed Nov 02 2022 - 13:18:32 EST


On Tue, Nov 1, 2022 at 7:45 PM Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>
> On Tue, Nov 01, 2022 at 04:14:39PM -0700, Jeff Xu wrote:
> > Sorry for the long overdue reply.
>
> No worries! I am a fan of thread necromancy. :)
>
> > [...]
> > 1> memfd_create:
> > Add two flags:
> > #define MFD_EXEC 0x0008
> > #define MFD_NOEXEC_SEAL 0x0010
> > This lets application to set executable bit explicitly.
> > (If application set both, it will be rejected)
>
> So no MFD_NOEXEC without seal? (I'm fine with that.)
>
no MFD_NOEXEC because memfd can be chmod to add x after creation,
it is not secure.

no MFD_EXEC_SEAL because it is better to apply both w and x seal
within the same function call, and w seal can't be applied at creation time.

> > 2> For old application that doesn't set executable bit:
> > Add a pid name-spaced sysctl.kernel.pid_mfd_noexec, with:
>
> bikeshed: vm.memfd_noexec
> (doesn't belong in "kernel", and seems better suited to "vm" than "fs")
>
SG, will use vm.memfd_noexec

> > value = 0: Default_EXEC
> > Honor MFD_EXEC and MFD_NOEXEC_SEAL
> > When none is set, will fall back to original behavior (EXEC)
>
> Yeah. Rephrasing for myself to understand more clearly:
>
> "memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL act like MFD_EXEC
> was set."
>
> > value = 1: Default_NOEXEC_SEAL
> > Honor MFD_EXEC and MFD_NOEXEC_SEAL
> > When none is set, will default to MFD_NOEXEC_SEAL
>
> "memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL act like
> MFD_NOEXEC_SEAL was set."
>
Copy, this is clearer. Thanks.

> Also, I think there should be a pr_warn_ratelimited() when
> memfd_create() is used without either bit, so that there is some
> pressure to please adjust their API calls to explicitly set a bit.
>
Sure

> > 3> Add a pid name-spaced sysctl kernel.pid_mfd_noexec_enforced: with:
> > value = 0: default, not enforced.
> > value = 1: enforce NOEXEC_SEAL (overwrite everything)
>
> How about making this just mode "value 2" for the first sysctl?
> "memfd_create() without MFD_NOEXEC_SEAL will be rejected."
>
Good point. Kernel overwriting might not be a good practice.
I will add to vm.mfd_noexec.
value = 2: "memfd_create() without MFD_NOEXEC_SEAL will be rejected."

Thanks!
Jeff

> -Kees
>
> --
> Kees Cook