Re: can device drivers return non-ram via vm_ops->nopage?

From: Andrea Arcangeli
Date: Sun Mar 21 2004 - 19:36:33 EST


On Sun, Mar 21, 2004 at 11:58:54PM +0000, Russell King wrote:
> On Sun, Mar 21, 2004 at 03:51:31PM -0800, Linus Torvalds wrote:
> > That might be the minimal fix, since it would basically involve:
> > - change whatever offensive "virt_to_page()" calls into
> > "dma_map_to_page()".
> > - implement "dma_map_to_page()" for all architectures.
> >
> > Would that make people happy?
>
> Unfortunately this doesn't make dwmw2 happy - he claims to have machines
> which implement dma_alloc_coherent using RAM which doesn't have any
> struct page associated with it.

I would suggest to add a ->nopage_dma (or whatever other name for an
additional callback in the vm_ops) that will return a non pageable "pfn"
number (not a page_t*). This is all the VM needs to setup the pte
properly, this callback will not know anything about the pageable stuff
(i.e. it will not have to call page_add_rmap or stuff like that).

I definitely agree a driver currently has no way to work safe if it
returns non-ram via ->nopage and it must use remap_file_pages, but OTOH
I don't like remap_file_pages myself, it's a lot nicer to use paging
even for mapping non-ram, even if you don't use scatter gather, even if
you've just an huge block of contigous physical ram, at the very least
for the scheduler latencies in a loop under the page_table_lock.

nopage_dma will be like this:

do_no_page_dma(vma, ...)
{
pfn = vma->vm_ops->nopage_dma()
if (pfn_valid(pfn)) {
/*
* going from valid pfn to page is always ok
* the other way around not
*/
page = pfn_to_page(pfn);
BUG_ON(page->mapping);
if (!PageReserved(page))
mm->rss++;
}
setup the pte using the pfn here, no vm accounting or pte tracking
required since it's either non valid pfn or reserved page that
will be ignored by the zap_pte stuff
}

do_no_page()
{
if (!vma->vm_ops || !vma->vm_ops->nopage)
return do_anonymous_page(mm, vma, page_table,
pmd, write_access, address);
if (vma->vm_ops->nopage_dma)
return do_no_page_dma(...)
}

Then the mmu VM troubles are over, how you keep the cache of this pte
view coherent with the iommu view isn't something solvable by the mmu,
but certainly you can add whatever cache flushing callback in teh
do_no_page_dma core, that's a slow path so you can play with it from any
arch adding whatever needed library calls.

btw, on a slightly related note, I don't think this is safe in
get_user_pages in 2.6:

if (!PageReserved(pages[i]))
page_cache_get(pages[i]);

there's nothing preventing munmap to free the page while somebody does
I/O on the page via get_user_pages.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/