Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK

From: Christoph Lameter
Date: Fri May 24 2013 - 11:40:34 EST


On Fri, 24 May 2013, Peter Zijlstra wrote:

> Patch bc3e53f682 ("mm: distinguish between mlocked and pinned pages")
> broke RLIMIT_MEMLOCK.

Nope the patch fixed a problem with double accounting.

The problem that we seem to have is to define what mlocked and pinned mean
and how this relates to RLIMIT_MEMLOCK.

mlocked pages are pages that are movable (not pinned!!!) and that are
marked in some way by user space actions as mlocked (POSIX semantics).
They are marked with a special page flag (PG_mlocked).

Pinned pages are pages that have an elevated refcount because the hardware
needs to use these pages for I/O. The elevated refcount may be temporary
(then we dont care about this) or for a longer time (such as the memory
registration of the IB subsystem). That is when we account the memory as
pinned. The elevated refcount stops page migration and other things from
trying to move that memory.

Pages can be both pinned and mlocked. Before my patch some pages those two
issues were conflated since the same counter was used and therefore these
pages were counted twice. If an RDMA application was running using
mlockall() and was performing large scale I/O then the counters could show
extraordinary large numbers and the VM would start to behave erratically.

It is important for the VM to know which pages cannot be evicted but that
involves many more pages due to dirty pages etc etc.

So far the assumption has been that RLIMIT_MEMLOCK is a limit on the pages
that userspace has mlocked.

You want the counter to mean something different it seems. What is it?

I think we need to be first clear on what we want to accomplish and what
these counters actually should count before changing things.

Certainly would appreciate improvements in this area but resurrecting the
conflation between mlocked and pinned pages is not the way to go.

> This patch proposes to properly fix the problem by introducing
> VM_PINNED. This also provides the groundwork for a possible mpin()
> syscall or MADV_PIN -- although these are not included.

Maybe add a new PIN page flag? Pages are not pinned per vma as the patch
seems to assume.

> It recognises that pinned page semantics are a strict super-set of
> locked page semantics -- a pinned page will not generate major faults
> (and thus satisfies mlock() requirements).

Not exactly true. Pinned pages may not have the mlocked flag set and they
are not managed on the unevictable LRU lists of the MM.

> If people find this approach unworkable, I request we revert the above
> mentioned patch to at least restore RLIMIT_MEMLOCK to a usable state
> again.

Cannot do that. This will cause the breakage that the patch was fixing to
resurface.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/