Re: [PATCH v2 4/5] memfd: replace ratcheting feature from vm.memfd_noexec with hierarchy

From: Dominique Martinet
Date: Wed Aug 16 2023 - 01:45:25 EST


Jeff Xu wrote on Tue, Aug 15, 2023 at 10:13:18PM -0700:
> > Given that it is possible for CAP_SYS_ADMIN users to create executable
> > binaries without memfd_create(2) and without touching the host
> > filesystem (not to mention the many other things a CAP_SYS_ADMIN process
> > would be able to do that would be equivalent or worse), it seems strange
> > to cause a fair amount of headache to admins when there doesn't appear
> > to be an actual security benefit to blocking this. There appear to be
> > concerns about confused-deputy-esque attacks[2] but a confused deputy that
> > can write to arbitrary sysctls is a bigger security issue than
> > executable memfds.
> >
> Something to point out: The demo code might be enough to prove your
> case in other distributions, however, in ChromeOS, you can't run this
> code. The executable in ChromeOS are all from known sources and
> verified at boot.
> If an attacker could run this code in ChromeOS, that means the
> attacker already acquired arbitrary code execution through other ways,
> at that point, the attacker no longer needs to create/find an
> executable memfd, they already have the vehicle. You can't use an
> example of an attacker already running arbitrary code to prove that
> disable downgrading is useless.
> I agree it is a big problem that an attacker already can modify a
> sysctl. Assuming this can happen by controlling arguments passed into
> sysctl, at the time, the attacker might not have full arbitrary code
> execution yet, that is the reason the original design is so
> restrictive.

I don't understand how you can say an attacker cannot run arbitrary code
within a process here, yet assert that they'd somehow run memfd_create +
execveat on it if this sysctl is lowered -- the two look equivalent to
me?

CAP_SYS_ADMIN is a kludge of a capability that pretty much gives root as
soon as you can run arbitrary code (just have a look at the various
container escape example when the capability is given); I see little
point in trying to harden just this here.
It'd make more sense to limit all sysctl modifications in the context
you're thinking of through e.g. selinux or another LSM.

(in the context of users making their own containers, my suggestion is
always to never use CAP_SYS_ADMIN, or if they must give it to a separate
minimal container where they can limit user interaction)


FWIW, I also think the proposed =2 behaviour makes more sense, but this
is something we already discussed last month so I won't come back to it
as not really involved here.

--
Dominique Martinet | Asmadeus