Re: [PATCH 0/3]HTLB mapping for drivers (take 2)

From: Mel Gorman
Date: Wed Aug 19 2009 - 06:06:30 EST


On Wed, Aug 19, 2009 at 05:48:11PM +1200, Alexey Korolev wrote:
> Hi,
> >
> > It sounds like this patch set working towards the same goal as my
> > MAP_HUGETLB set.  The only difference I see is you allocate huge page
> > at a time and (if I am understanding the patch) fault the page in
> > immediately, where MAP_HUGETLB only faults pages as needed.  Does the
> > MAP_HUGETLB patch set provide the functionality that you need, and if
> > not, what can be done to provide what you need?
> >
>
> Thanks a lot for willing to help. I'll be much appreciate if you have
> an interesting idea how HTLB mapping for drivers can be done.
>
> It is better to describe use case in order to make it clear what needs
> to be done.
> Driver provides mapping of device DMA buffers to user level
> applications.

Ok, so the buffer is in normal memory. When mmap() is called, the buffer
is already populated by data DMA'd from the device. That scenario rules out
calling mmap(MAP_ANONYMOUS|MAP_HUGETLB) because userspace has access to the
buffer before it is populated by data from the device.

However, it does not rule out mmap(MAP_ANONYMOUS|MAP_HUGETLB) when userspace
is responsible for populating a buffer for sending to a device. i.e. whether it
is suitable or not depends on when the buffer is populated and who is doing it.

> User level applications process the data.
> Device is using a master DMA to send data to the user buffer, buffer
> size can be >1GB and performance is very important. (So huge pages
> mapping really makes sense.)
>

Ok, so the DMA may be faster because you have to do less scatter/gather
and can DMA in larger chunks and and reading from userspace may be faster
because there is less translation overhead. Right?

> In addition we have to mention that:
> 1. It is hard for user to tell how much huge pages needs to be
> reserved by the driver.

I think you have this problem either way. If the buffer is allocated and
populated before mmap(), then the driver is going to have to guess how many
pages it needs. If the DMA occurs as a result of mmap(), it's easier because
you know the number of huge pages to be reserved at that point and you have
the option of falling back to small pages if necessary.

> 2. Devices add constrains on memory regions. For example it needs to
> be contiguous with in the physical address space. It is necessary to
> have ability to specify special gfp flags.

The contiguity constraints are the same for huge pages. Do you mean there
are zone restrictions? If so, the hugetlbfs_file_setup() function could be
extended to specify a GFP mask that is used for the allocation of hugepages
and associated with the hugetlbfs inode. Right now, there is a htlb_alloc_mask
mask that is applied to some additional flags so htlb_alloc_mask would be
the default mask unless otherwise specified.

> 3 The HW needs to access physical memory before the user level
> software can access it. (Hugetlbfs picks up pages on page fault from
> pool).
> It means memory allocation needs to be driven by device driver.
>

How about;

o Extend Eric's helper slightly to take a GFP mask that is
associated with the inode and used for allocations from
outside the hugepage pool
o A helper that returns the page at a given offset within
a hugetlbfs file for population before the page has been
faulted.

I know this is a bit hand-wavy, but it would allow significant sharing
of the existing code and remove much of the hugetlbfs-awareness from
your current driver.

> Original idea was: create hugetlbfs file which has common mapping with
> device file. Allocate memory. Populate page cache of hugetlbfs file
> with allocated pages.
> When fault occurs, page will be taken from page cache and then
> remapped to user space by hugetlbfs.
>
> Another possible approach is described here:
> http://marc.info/?l=linux-mm&m=125065257431410&w=2
> But currently not sure will it work or not.
>
>
> Thanks,
> Alexey
>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/