Re: [RFC] Changing COW detection to be memory hotplug friendly

From: IWAMOTO Toshihiro
Date: Wed Feb 09 2005 - 04:11:34 EST


At Mon, 7 Feb 2005 21:24:59 +0000 (GMT),
Hugh Dickins wrote:
>
> On Thu, 3 Feb 2005, IWAMOTO Toshihiro wrote:
> > The current implementation of memory hotremoval relies on that pages
> > can be unmapped from process spaces. After successful unmapping,
> > subsequent accesses to the pages are blocked and don't interfere
> > the hotremoval operation.
> >
> > However, this code
> >
> > if (PageSwapCache(page) &&
> > page_count(page) != page_mapcount(page) + 2) {
> > ret = SWAP_FAIL;
> > goto out_unmap;
> > }
>
> Yes, that is odd code. It would be nice to have a solution without it.
>
> > in try_to_unmap_one() prevents unmapping pages that are referenced via
> > get_user_pages(), and such references can be held for a long time if
> > they are due to such as direct IO.
> > I've made a test program that issues multiple direct IO read requests
> > against a single read buffer, and pages that belong to the buffer
> > cannot be hotremoved because they aren't unmapped.
>
> I haven't looked at the rest of your hotremoval, so it's not obvious
> to me how a change here would help you - obviously you wouldn't want
> to be migrating pages while direct IO to them was in progress.
>
> I presume your patch works for you by letting the page count fall
> to a point where migration moves it automatically as soon as the
> got_user_pages are put, where without your patch the count is held
> too high, and you keep doing scans which tend to miss the window
> in which those pages are put?

Yes. And my test program has no such time window because IO requests
are issued without waiting for completion of older requests.
I think issuing IO requests in such a manner is nonsense, but
a misbehaving process shouldn't be able to prevent memory hotremoval.

If the hotremoval code can unmap a page from a process space, the
process will be blocked when it causes a page fault against the page.
So, a process cannot continuously issue direct IO requests to keep
page counts high. (get_user_pages() causes a (simulated) page fault.)


> > - The can_share_swap_page() call in do_swap_page() always returns
> > false. It is inefficient but should be harmless. Incrementing
> > page_mapcount before calling that function should fix the problem,
> > but it may cause bad side effects.
>
> Odd that your patch moves it if it now doesn't even work!
> But I think some more movement should be able to solve that.

Moving swap_free() is necessary to avoid can_share_swap_page()
returning true when it shouldn't. And, write access cases are handled
later with do_wp_page() call, anyway.

> > - Another obvious solution to this issue is to find the "offending"
> > process from a un-unmappable page and suspend it until the page is
> > unmapped. I'm afraid the implementation would be much more complicated.
>
> Agreed, let's not get into that.

Nice to hear that.

> > --- mm/memory.c.orig 2005-01-17 14:47:11.000000000 +0900
> > +++ mm/memory.c 2005-01-17 14:55:51.000000000 +0900
> > @@ -1786,10 +1786,6 @@ static int do_swap_page(struct mm_struct
> > }
> >
> > /* The page isn't present yet, go ahead with the fault. */
> > -
> > - swap_free(entry);
> > - if (vm_swap_full())
> > - remove_exclusive_swap_page(page);
> >
> > mm->rss++;
> > acct_update_integrals();
> > @@ -1800,6 +1796,10 @@ static int do_swap_page(struct mm_struct
> > pte = maybe_mkwrite(pte_mkdirty(pte), vma);
> > write_access = 0;
> > }
> > +
> > + swap_free(entry);
> > + if (vm_swap_full())
> > + remove_exclusive_swap_page(page);
> > unlock_page(page);
> >
> > flush_icache_page(vma, page);
> > --- mm/rmap.c.orig 2005-01-17 14:40:08.000000000 +0900
> > +++ mm/rmap.c 2005-01-21 12:34:06.000000000 +0900
> > @@ -569,8 +569,11 @@ static int try_to_unmap_one(struct page
> > */
> > if (PageSwapCache(page) &&
> > page_count(page) != page_mapcount(page) + 2) {
> > - ret = SWAP_FAIL;
> > - goto out_unmap;
> > + if (page_mapcount(page) > 1) { /* happens when COW is in progress */
> > + ret = SWAP_FAIL;
> > + goto out_unmap;
> > + }
> > + printk("unmapping GUPed page\n");
> > }
> >
> > /* Nuke the page table entry. */
>
> I'm disappointed to see you making this more complicated, it's
> even harder to understand than before. I believe that if we're
> going to make good use of page_mapcount in can_share_swap_page,
> it should be possible to delete this block from try_to_unmap_one
> altogether. Or did you try that, and find it goes wrong?

I just wanted to be conservative to get a working patch.
I think this block can be deleted.

> > --- mm/swapfile.c.orig 2005-01-17 14:47:11.000000000 +0900
> > +++ mm/swapfile.c 2005-01-31 16:59:11.000000000 +0900

> This can_share_swap_page looks messier than I'd want.
>
> I was hoping to append my own patch to this response, but I haven't
> got it working right yet (swap seems too full). I hope tomorrow,
> but thought I'd better not delay these comments any longer.

I saw your patch in the other mail, and it looks better and should fix
my problem. I'll try and report.

--
IWAMOTO Toshihiro
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/