Re: [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v3

From: Nick Piggin
Date: Thu May 28 2009 - 08:24:13 EST


On Thu, May 28, 2009 at 05:59:34PM +0800, Wu Fengguang wrote:
> Hi Nick,
>
> > > + /*
> > > + * remove_from_page_cache assumes (mapping && !mapped)
> > > + */
> > > + if (page_mapping(p) && !page_mapped(p)) {
> > > + remove_from_page_cache(p);
> > > + page_cache_release(p);
> > > + }
> >
> > remove_mapping would probably be a better idea. Otherwise you can
> > probably introduce pagecache removal vs page fault races which
> > will make the kernel bug.
>
> We use remove_mapping() at first, then discovered that it made strong
> assumption on page_count=2.
>
> I guess it is safe from races since we are locking the page?

Yes it probably should (although you will lose get_user_pages data, but
I guess that's the aim anyway).

But I just don't like this one file having all that required knowledge
(and few comments) of all the files in mm/. If you want to get rid
of the page and don't care what it's count or dirtyness is, then
truncate_inode_pages_range is the correct API to use.

(or you could extract out some of it so you can call it directly on
individual locked pages, if that helps).


> > > + }
> > > +
> > > + me_pagecache_clean(p);
> > > +
> > > + /*
> > > + * Did the earlier release work?
> > > + */
> > > + if (page_has_private(p) && !try_to_release_page(p, GFP_NOIO))
> > > + return FAILED;
> > > +
> > > + return RECOVERED;
> > > +}
> > > +
> > > +/*
> > > + * Clean and dirty swap cache.
> > > + */
> > > +static int me_swapcache_dirty(struct page *p)
> > > +{
> > > + ClearPageDirty(p);
> > > +
> > > + if (!isolate_lru_page(p))
> > > + page_cache_release(p);
> > > +
> > > + return DELAYED;
> > > +}
> > > +
> > > +static int me_swapcache_clean(struct page *p)
> > > +{
> > > + ClearPageUptodate(p);
> > > +
> > > + if (!isolate_lru_page(p))
> > > + page_cache_release(p);
> > > +
> > > + delete_from_swap_cache(p);
> > > +
> > > + return RECOVERED;
> > > +}
> >
> > All these handlers are quite interesting in that they need to
> > know about most of the mm. What are you trying to do in each
> > of them would be a good idea to say, and probably they should
> > rather go into their appropriate files instead of all here
> > (eg. swapcache stuff should go in mm/swap_state for example).
>
> Yup, they at least need more careful comments.
>
> Dirty swap cache page is tricky to handle. The page could live both in page
> cache and swap cache(ie. page is freshly swapped in). So it could be referenced
> concurrently by 2 types of PTEs: one normal PTE and another swap PTE. We try to
> handle them consistently by calling try_to_unmap(TTU_IGNORE_HWPOISON) to convert
> the normal PTEs to swap PTEs, and then
> - clear dirty bit to prevent IO
> - remove from LRU
> - but keep in the swap cache, so that when we return to it on
> a later page fault, we know the application is accessing
> corrupted data and shall be killed (we installed simple
> interception code in do_swap_page to catch it).

OK this is the point I was missing.

Should all be commented and put into mm/swap_state.c (or somewhere that
Hugh prefers).


> Clean swap cache pages can be directly isolated. A later page fault will bring
> in the known good data from disk.

OK, but why do you ClearPageUptodate if it is just to be deleted from
swapcache anyway?


> > You haven't waited on writeback here AFAIKS, and have you
> > *really* verified it is safe to call delete_from_swap_cache?
>
> Good catch. I'll soon submit patches for handling the under
> read/write IO pages. In this patchset they are simply ignored.

Well that's quite important ;) I would suggest you just wait_on_page_writeback.
It is simple and should work. _Unless_ you can show it is a big problem that
needs equivalently big mes to fix ;)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/