Re: [PATCH v8 3/5] mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC

From: Dominique Martinet
Date: Wed Jun 28 2023 - 15:32:13 EST


Dominique Martinet wrote on Wed, Jun 28, 2023 at 08:42:41PM +0900:
> If flags already has either MFD_EXEC or MFD_NOEXEC_SEAL, you don't check
> the sysctl at all.
> [...repro snipped..]
>
> What am I missing?

(Perhaps the intent is just to force people to use the flag so it is
easier to check for memfd_create in seccomp or other LSM?
But I don't see why such a check couldn't consider the absence of a flag
as well, so I don't see the point.)


> BTW I find the current behaviour rather hard to use: setting this to 2
> should still set NOEXEC by default in my opinion, just refuse anything
> that explicitly requested EXEC.

And I just noticed it's not possible to lower the value despite having
CAP_SYS_ADMIN: what the heck?! I have never seen such a sysctl and it
just forced me to reboot because I willy-nilly tested in the init pid
namespace, and quite a few applications that don't require exec broke
exactly as I described below.

If the user has CAP_SYS_ADMIN there are more container escape methods
than I can count, this is basically free pass to root on main namespace
anyway, you're not protecting anything. Please let people set the sysctl
to what they want.

> Sure there's a warn_once that memfd_create was used without seal, but
> right now on my system it's "used up" 5 seconds after boot by systemd:
> [ 5.854378] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=1 'systemd'
>
> And anyway, older kernels will barf up EINVAL when calling memfd_create
> with MFD_NOEXEC_SEAL, so even if userspace will want to adapt they'll
> need to try calling memfd_create with the flag once and retry on EINVAL,
> which let's face it is going to take a while to happen.
> (Also, the flag has been added to glibc, but not in any release yet)
>
> Making calls default to noexec AND refuse exec does what you want
> (forbid use of exec in an app that wasn't in a namespace that allows
> exec) while allowing apps that require it to work; that sounds better
> than making all applications that haven't taken the pain of adding the
> new flag to me.
> Well, I guess an app that did require exec without setting the flag will
> fail in a weird place instead of failing at memfd_create and having a
> chance to fallback, so it's not like it doesn't make any sense;
> I don't have such strong feelings about this if the sysctl works, but
> for my use case I'm more likely to want to take a chance at memfd_create
> not needing exec than having the flag set. Perhaps a third value if I
> cared enough...


--
Dominique Martinet | Asmadeus