Re: [RFC 0/3] hugetlbfs: optionally reserve all fs pages at mount time

From: Michal Hocko
Date: Fri Mar 06 2015 - 10:10:54 EST


On Mon 02-03-15 17:18:14, Mike Kravetz wrote:
> On 03/02/2015 03:10 PM, Andrew Morton wrote:
> >On Fri, 27 Feb 2015 14:58:08 -0800 Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
> >
> >>hugetlbfs allocates huge pages from the global pool as needed. Even if
> >>the global pool contains a sufficient number pages for the filesystem
> >>size at mount time, those global pages could be grabbed for some other
> >>use. As a result, filesystem huge page allocations may fail due to lack
> >>of pages.
> >
> >Well OK, but why is this a sufficiently serious problem to justify
> >kernel changes? Please provide enough info for others to be able
> >to understand the value of the change.
> >
>
> Thanks for taking a look.
>
> Applications such as a database want to use huge pages for performance
> reasons. hugetlbfs filesystem semantics with ownership and modes work
> well to manage access to a pool of huge pages. However, the application
> would like some reasonable assurance that allocations will not fail due
> to a lack of huge pages. Before starting, the application will ensure
> that enough huge pages exist on the system in the global pools. What
> the application wants is exclusive use of a pool of huge pages.
>
> One could argue that this is a system administration issue. The global
> huge page pools are only available to users with root privilege.
> Therefore, exclusive use of a pool of huge pages can be obtained by
> limiting access. However, many applications are installed to run with
> elevated privilege to take advantage of resources like huge pages. It
> is quite possible for one application to interfere another, especially
> in the case of something like huge pages where the pool size is mostly
> fixed.
>
> Suggestions for other ways to approach this situation are appreciated.
> I saw the existing support for "reservations" within hugetlbfs and
> thought of extending this to cover the size of the filesystem.

Maybe I do not understand your usecase properly but wouldn't hugetlb
cgroup (CONFIG_CGROUP_HUGETLB) help to guarantee the same? Just
configure limits for different users/applications (inside different
groups) so that they never overcommit the existing pool. Would that work
for you?

--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/