Re: missing madvise functionality

From: Eric Dumazet
Date: Tue Apr 03 2007 - 17:11:34 EST


Rik van Riel a Ãcrit :
Andrew Morton wrote:

Oh. I was assuming that we'd want to unmap these pages from pagetables and
mark then super-easily-reclaimable. So a later touch would incur a minor
fault.

But you think that we should leave them mapped into pagetables so no such
fault occurs.

Leaving the pages mapped into pagetables means that they are considerably
less likely to be reclaimed.

If we move the pages to a place where they are very likely to be
reclaimed quickly (end of the inactive list, or a separate
reclaim list) and clear the dirty and referenced lists, we can
both reclaim the page easily *and* avoid the page fault penalty.


There is one possible speedup :

- If an user app does a madvise(MADV_DONTNEED), we can assume the pages can later be bring back without need to zero them. The application doesnt care.

A page fault is not that expensive. But clearing N*PAGE_SIZE bytes is, because it potentially evicts a large part of CPU cache.

If I recall well, mysql bench Ulrich mentioned was allocating/freeing large areas (100 Kbytes or so) in a loop.

mmap()/brk() must give fresh NULL pages, but maybe madvise(MADV_DONTNEED) can relax this requirement (if the pages were reclaimed, then a page fault could bring a new page with random content)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/