Re: RFC: Block reservation for hugetlbfs

From: David Gibson
Date: Tue Feb 21 2006 - 18:45:18 EST


On Tue, Feb 21, 2006 at 03:18:59PM +1100, Nick Piggin wrote:
> David Gibson wrote:
> >These days, hugepages are demand-allocated at first fault time.
> >There's a somewhat dubious (and racy) heuristic when making a new
> >mmap() to check if there are enough available hugepages to fully
> >satisfy that mapping.
> >
> >A particularly obvious case where the heuristic breaks down is where a
> >process maps its hugepages not as a single chunk, but as a bunch of
> >individually mmap()ed (or shmat()ed) blocks without touching and
> >instantiating the blocks in between allocations. In this case the
> >size of each block is compared against the total number of available
> >hugepages. It's thus easy for the process to become overcommitted,
> >because each block mapping will succeed, although the total number of
> >hugepages required by all blocks exceeds the number available. In
> >particular, this defeats such a program which will detect a mapping
> >failure and adjust its hugepage usage downward accordingly.
> >
> >The patch below is a draft attempt to address this problem, by
> >strictly reserving a number of physical hugepages for hugepages inodes
> >which have been mapped, but not instatiated. MAP_SHARED mappings are
> >thus "safe" - they will fail on mmap(), not later with a SIGBUS.
> >MAP_PRIVATE mappings can still SIGBUS.
> >
> >This patch appears to address the problem at hand - it allows DB2 to
> >start correctly, for instance, which previously suffered the failure
> >described above. I'm almost certain I'm missing some locking or other
> >synchronization - I am entirely bewildered as to what I need to hold
> >to safely update i_blocks as below. Corrections for my ignorance
> >solicited...
> >
> >Signed-off-by: David Gibson <dwg@xxxxxxxxxxx>
> >
>
> This introduces
> tree_lock(r) -> hugetlb_lock
>
> And we already have
> hugetlb_lock -> lru_lock
>
> So we now have tree_lock(r) -> lru_lock, which would deadlock
> against lru_lock -> tree_lock(w), right?
>
> From a quick glance it looks safe, but I'd _really_ rather not
> introduce something like this.

Urg.. good point. I hadn't even thought of that consequence - I was
more worried about whether I need i_lock or i_mutex to protect my
updates to i_blocks.

Would hugetlb_lock -> tree_lock(r) be any preferable (I think that's a
possible alternative).

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/