Re: [PATCH v1 0/7] Remove in-tree usage of MAP_DENYWRITE

From: Eric W. Biederman
Date: Thu Aug 12 2021 - 13:49:24 EST


"Andy Lutomirski" <luto@xxxxxxxxxx> writes:

> On Thu, Aug 12, 2021, at 10:32 AM, Eric W. Biederman wrote:
>> David Hildenbrand <david@xxxxxxxxxx> writes:
>>
>> > This series is based on v5.14-rc5 and corresponds code-wise to the
>> > previously sent RFC [1] (the RFC still applied cleanly).
>> >
>> > This series removes all in-tree usage of MAP_DENYWRITE from the kernel
>> > and removes VM_DENYWRITE. We stopped supporting MAP_DENYWRITE for
>> > user space applications a while ago because of the chance for DoS.
>> > The last renaming user is binfmt binary loading during exec and
>> > legacy library loading via uselib().
>> >
>> > With this change, MAP_DENYWRITE is effectively ignored throughout the
>> > kernel. Although the net change is small, I think the cleanup in mmap()
>> > is quite nice.
>> >
>> > There are some (minor) user-visible changes with this series:
>> > 1. We no longer deny write access to shared libaries loaded via legacy
>> > uselib(); this behavior matches modern user space e.g., via dlopen().
>> > 2. We no longer deny write access to the elf interpreter after exec
>> > completed, treating it just like shared libraries (which it often is).
>> > 3. We always deny write access to the file linked via /proc/pid/exe:
>> > sys_prctl(PR_SET_MM_EXE_FILE) will fail if write access to the file
>> > cannot be denied, and write access to the file will remain denied
>> > until the link is effectivel gone (exec, termination,
>> > PR_SET_MM_EXE_FILE) -- just as if exec'ing the file.
>> >
>> > I was wondering if we really care about permanently disabling write access
>> > to the executable, or if it would be good enough to just disable write
>> > access while loading the new executable during exec; but I don't know
>> > the history of that -- and it somewhat makes sense to deny write access
>> > at least to the main executable. With modern user space -- dlopen() -- we
>> > can effectively modify the content of shared libraries while being
>> > used.
>>
>> So I think what we really want to do is to install executables with
>> and shared libraries without write permissions and immutable. So that
>> upgrades/replacements of the libraries and executables are forced to
>> rename or unlink them. We need the immutable bit as CAP_DAC_OVERRIDE
>> aka being root ignores the writable bits when a file is opened for
>> write. However CAP_DAC_OVERRIDE does not override the immutable state
>> of a file.
>
> If we really want to do this, I think we'd want a different flag
> that's more like sealed. Non-root users should be able to do this,
> too.
>
> Or we could just more gracefully handle users that overwrite running
> programs.

I had a blind spot, and Florian Weimer made a very reasonable request.
Apparently userspace for shared libraires uses MAP_PRIVATE.

So we almost don't care if the library is overwritten. We loose some
efficiency and apparently there are some corner cases like the library
being extended past the end of the exiting file that are problematic.

Given that MAP_PRIVATE for shared libraries is our strategy for handling
writes to shared libraries perhaps we just need to use MAP_POPULATE or a
new related flag (perhaps MAP_PRIVATE_NOW) that just makes certain that
everything mapped from the executable is guaranteed to be visible from
the time of the mmap, and any changes from the filesystem side after
that are guaranteed to cause a copy on write.

Once we get that figured out we could consider getting rid of deny-write
entirely.

Eric