Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic

From: Michal Hocko
Date: Wed May 11 2016 - 07:32:35 EST


On Wed 11-05-16 13:07:33, Peter Zijlstra wrote:
>
>
> On 05/13/2015 04:38 PM, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@xxxxxxx>
> >
> > MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
> > it has been introduced.
> > mlock(2) fails if the memory range cannot get populated to guarantee
> > that no future major faults will happen on the range. mmap(MAP_LOCKED) on
> > the other hand silently succeeds even if the range was populated only
> > partially.
> >
> > Fixing this subtle difference in the kernel is rather awkward because
> > the memory population happens after mm locks have been dropped and so
> > the cleanup before returning failure (munlock) could operate on something
> > else than the originally mapped area.
> >
> > E.g. speculative userspace page fault handler catching SEGV and doing
> > mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
> > mmap and lead to lost data. Although it is not clear whether such a
> > usage would be valid, mmap page doesn't explicitly describe requirements
> > for threaded applications so we cannot exclude this possibility.
> >
> > This patch makes the semantic of MAP_LOCKED explicit and suggest using
> > mmap + mlock as the only way to guarantee no later major page faults.
> >
>
> URGH, this really blows chunks. It basically means MAP_LOCKED is pointless
> cruft and we might as well remove it.

Yeah, the usefulness of MAP_LOCKED is somehow reduced. Everybody who
wants the full semantic really have to use mlock(2).

> Why not fix it proper?

I have tried but it turned out to be a problem because we are dropping
mmap_sem after we initialized VMA and as Linus pointed out there
are multithreaded applications which are doing opportunistic memory
management[1]. So we would have to hold the mmap_sem for write during
the whole VMA setup + population and that doesn't seem to be worth
all the trouble when we are even not sure whether somebody relies on
MAP_LOCKED to have the hard mlock semantic.

---
[1] http://lkml.kernel.org/r/CA+55aFydkG-BgZzry5DrTzueVh9VvEcVJdLV8iOyUphQk=0vpw@xxxxxxxxxxxxxx
--
Michal Hocko
SUSE Labs