Re: [RFC] Changing COW detection to be memory hotplug friendly

From: Andrea Arcangeli
Date: Thu Feb 10 2005 - 14:06:47 EST


On Tue, Feb 08, 2005 at 04:26:26PM +0000, Hugh Dickins wrote:
> Seems it was okay after all, I got confused by an unrelated issue.
> Here's what I had in mind, what do you think? Does it indeed help
> with your problem? I'm copying Andrea because it was he who devised
> that fix to the get_user_pages issue, and also (I think, longer ago)
> can_share_swap_page itself.
>
> This does rely on us moving 1 from mapcount to swapcount or back only
> while page is locked - there are places (e.g. exit) where mapcount
> comes down without page being locked, but that's not an issue: we just
> have to be sure that once we have mapcount, it can't go up while reading
> swapcount (I've probably changed more than is strictly necessary, but
> this seemed clearest to me).
>
> I've left out how we ensure swapoff makes progress on a page with high
> mapcount, haven't quite made my mind out about that: it's less of an
> issue now there's an activate_page in unuse_pte, plus it's not an
> issue which will much bother anyone but me, can wait until after.
>
> That PageAnon check in do_wp_page: seems worthwhile to avoid locking
> and unlocking the page if it's a file page, leaves can_share_swap_page
> simpler (a PageReserved is never PageAnon). But the patch is against
> 2.6.11-rc3-mm1, so I appear to be breaking the intention of
> do_wp_page_mk_pte_writable ("on a shared-writable page"),
> believe that's already broken but need to study it more.

The reason pinned pages cannot be unmapped is that if they're being
unmapped while a rawio read is in progress, a cow on some shared
swapcache under I/O could happen.

If a try_to_unmap + COW over a shared swapcache happens during a rawio
read, then the rawio reads will get lost generating data corruption.

I had not much time to check the patch yet, but it's quite complex and
could you explain again how do you prevent a cow to happen while the
rawio is in flight if you don't pin the pte? The PG_locked bitflag
cannot matter at all because the rawio read won't lock the page. What
you have to prevent is the pte to be zeroed out, it must be kept
writeable during the whole I/O. I don't see how you can allow unmapping
without running into COWs later on.

This is the only thing I care to understand really, since it's the only
case that the pte pinning was fixing.

Overall I see nothing wrong in preventing memory removal while rawio is
in flight. rawio cannot be in flight forever (ptrace and core dump as
well should complete eventually). Why can't you simply drop pages from
the freelist one by one until all of them are removed from the freelist?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/