Re: 2.5.67-mm3

From: Andrew Morton (akpm@digeo.com)
Date: Mon Apr 14 2003 - 23:55:41 EST


William Lee Irwin III <wli@holomorphy.com> wrote:
>
> Page clustering wants something similar but slightly different. The
> unit it wants as its stride (MMUPAGE_SIZE) isn't present so this doesn't
> really help or hurt it. I believe I actually dodged this bullet by
> ensuring (or incorrectly assuming) the callers used sizes <= MMUPAGE_SIZE
> and left it either unaltered and suboptimal or (worst-case) buggy.

Callers will use sizes between 1 and PAGE_CACHE_SIZE, with arbitrary
alignment. So you may need to fault in up to

        (PAGE_CACHE_SIZE / MMUPAGE_SIZE) + 1
        
pte's. And up to two PAGE_CACHE_SIZE pages.

Sort-of. The code is doing two things.

a) Make sure that all the relevant pte's are established in the correct
   state so we don't take a fault while holding the subsequent atomic kmap.

   This is just an optimisation. If we _do_ take the fault while holding
   an atomic kmap, we fall back to sleeping kmap, and do the whole copy
   again. It almost never happens.

b) Making sure that the pagecache page is present before we lock it. This
   is to handle the icky deadlock which occurs when someone is doing a
   write() into a MAP_SHARED region of the file, where the source and dest of
   the copy are the same physical page. If we take a fault and then try to
   bring the page uptodate in the fault handler we deadlock because the page
   is already locked.

   The fault-by-hand-before-locking-the-page is racy - if the VM steals
   the page again before we lock it (rare), the deadlock can still occur.

   I've been able to trigger the fault which causes fallback to kmap()
   occasionally, under heavy load. But never the deadlock.

   We don't know how to fix this for real. I had patch for a while which
   added current->locked_page, and filemap_nopage() would compare that with
   the to-be-locked page and say "ah-hah!" and take avoiding action.

   But then Hugh rudely pointed out that the deadlock was still present if
   two tasks were involved, each trying to fault in the other's locked page.

> I'm just going down the list of FIXME's in the VM I turned up by grepping.
> Should we do the following instead?

OK ;)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Apr 15 2003 - 22:00:34 EST