Re: [RFC 0/3] hugetlbfs: optionally reserve all fs pages at mount time

From: Mike Kravetz
Date: Mon Mar 02 2015 - 20:19:29 EST


On 03/02/2015 03:10 PM, Andrew Morton wrote:
On Fri, 27 Feb 2015 14:58:08 -0800 Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:

hugetlbfs allocates huge pages from the global pool as needed. Even if
the global pool contains a sufficient number pages for the filesystem
size at mount time, those global pages could be grabbed for some other
use. As a result, filesystem huge page allocations may fail due to lack
of pages.

Well OK, but why is this a sufficiently serious problem to justify
kernel changes? Please provide enough info for others to be able
to understand the value of the change.


Thanks for taking a look.

Applications such as a database want to use huge pages for performance
reasons. hugetlbfs filesystem semantics with ownership and modes work
well to manage access to a pool of huge pages. However, the application
would like some reasonable assurance that allocations will not fail due
to a lack of huge pages. Before starting, the application will ensure
that enough huge pages exist on the system in the global pools. What
the application wants is exclusive use of a pool of huge pages.

One could argue that this is a system administration issue. The global
huge page pools are only available to users with root privilege.
Therefore, exclusive use of a pool of huge pages can be obtained by
limiting access. However, many applications are installed to run with
elevated privilege to take advantage of resources like huge pages. It
is quite possible for one application to interfere another, especially
in the case of something like huge pages where the pool size is mostly
fixed.

Suggestions for other ways to approach this situation are appreciated.
I saw the existing support for "reservations" within hugetlbfs and
thought of extending this to cover the size of the filesystem.

--
Mike Kravetz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/