Re: [Lse-tech] RE: [PATCH] HUGETLB memory commitment

From: Ray Bryant
Date: Mon Apr 05 2004 - 21:01:15 EST


Hi Ken,

Chen, Kenneth W wrote:
Ray Bryant wrote on Monday, April 05, 2004 11:22 AM

Chen, Kenneth W wrote:
I actually started coding yesterday. It doesn't look too bad (I think).
I will post it once I finished it up later today or tomorrow.

Hmmm...so did I. Oh well. We can pull the good ideas from both. :-)


I did have a revelation from your original demand-paging patch with per-inode
tracking ;-) I extended it into tracking by struct address_space (so we don't
pollute inode structure) and added per-block tracking. See patch at the end of
this post. I admit I had very pessimistic thoughts until I saw your patch.


Cool!

Either way works, I think. I just used the u.generic_ip pointer because it was there and convenient. It's intended to be a hook in the VFS layer for a particular file system to add info to the inode, as near as I can tell, but neither hugetlbfs nor shmfs use it, so it was free for the taking.



There are still some oddity in lifetime of the huge page reservation,
but that can be discussed once everyone sees the code.


I was thinking the lifetime of the huge page reservation should be the life
of a mapping, i.e., only persist across mmap/munmap. That means add a ref
count in the per-block tracking. This seriously complicates the design
because now, ref count needs to be updated in munmap and fault_hander in
addition to the mmap and truncate. Not to mention that Andy Whitcroft already
pointed out we don't get notification from munmap. Plus it seriously make
tracking logic complicated and have performance down side as well.

I guess everyone is OK with reservation lives until file truncate?

One can certainly argue that the only thing that is required to live until file truncate is the contents of the huge pages in the page cache, since applications expect that the data will be there in the file/segment across program executions until the file is truncated or the segment deleted.

But it certainly makes sense to me that if program A creates an mmap()'d file of 10 huge pages, that if Program B comes along later and re-mmaps() that same file, that Program B will be guaranteed to be able to touch all 10 pages, even if Program A only touched 5. So that is an argument for having the reservation last until file truncate/segment removal time.

Additionally, recall that we are trying to emulate the behavior of the hugetlb_prefault() implementation. Under that implementation, if Program A would mmap() 10 huge pages, then Program B would be guarenteed not to get a SIGBUS when it mmap()'s and references those 10 pages, provided only that the underlying file/segment was not deleted in between execution of the two programs.

So, I think we >>have<< to have the reservation last until file truncate/segment deletion time. Fortunately, that turns out to be easier to implement as well. :-)

I'll check through your patch and make sure we've both covered the same bases there. If so, we should be good to go with either version.

--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant
512-453-9679 (work) 512-507-7807 (cell)
raybry@xxxxxxx raybry@xxxxxxxxxxxxx
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/