Re: [PATCH 0/3] Guarantee faults for processes that callmmap(MAP_PRIVATE) on hugetlbfs v2

From: David Gibson
Date: Wed May 07 2008 - 21:57:29 EST


On Wed, May 07, 2008 at 08:38:26PM +0100, Mel Gorman wrote:
> MAP_SHARED mappings on hugetlbfs reserve huge pages at mmap() time.
> This guarantees all future faults against the mapping will succeed.
> This allows local allocations at first use improving NUMA locality whilst
> retaining reliability.
>
> MAP_PRIVATE mappings do not reserve pages. This can result in an application
> being SIGKILLed later if a huge page is not available at fault time. This
> makes huge pages usage very ill-advised in some cases as the unexpected
> application failure cannot be detected and handled as it is immediately fatal.
> Although an application may force instantiation of the pages using mlock(),
> this may lead to poor memory placement and the process may still be killed
> when performing COW.
>
> This patchset introduces a reliability guarantee for the process which creates
> a private mapping, i.e. the process that calls mmap() on a hugetlbfs file
> successfully. The first patch of the set is purely mechanical code move to
> make later diffs easier to read. The second patch will guarantee faults up
> until the process calls fork(). After patch two, as long as the child keeps
> the mappings, the parent is no longer guaranteed to be reliable. Patch
> 3 guarantees that the parent will always successfully COW by unmapping
> the pages from the child in the event there are insufficient pages in the
> hugepage pool in allocate a new page, be it via a static or dynamic pool.

I don't think patch 3 is a good idea. It's a fair bit of code to
implement a pretty bizarre semantic that I really don't think is all
that useful. Patches 1-2 are already sufficient to cover the
fork()/exec() case and a fair proportion of fork()/minor
frobbing/exit() cases. If the child also needs to write the hugepage
area, chances are it's doing real work and we care about its
reliability too.

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/