Re: [BUG] infinite loop in find_get_pages()

From: Eric Dumazet
Date: Wed Sep 14 2011 - 16:56:01 EST


Le mercredi 14 septembre 2011 Ã 13:38 -0700, Hugh Dickins a Ãcrit :
> On Wed, 14 Sep 2011, Shaohua Li wrote:
> > On Wed, 2011-09-14 at 16:43 +0800, Eric Dumazet wrote:
> > > Le mercredi 14 septembre 2011 Ã 16:20 +0800, Shaohua Li a Ãcrit :
> > > > 2011/9/14 Shaohua Li <shli@xxxxxxxxxx>:
> > > > > it appears we didn't account skipped swap entry in find_get_pages().
> > > > > does the attached patch help?
> > > > I can easily reproduce the issue. Just cp files in tmpfs, trigger swap and
> > > > drop caches. The debug patch fixes it at my side.
> > > > Eric, please try it.
> > > >
> > >
> > > Hello Shaohua
> > >
> > > I tried it with added traces :
> > >
> > >
> > > [ 277.077855] mv used greatest stack depth: 3336 bytes left
> > > [ 310.558012] nr_found=2 nr_skip=2
> > > [ 310.558139] nr_found=14 nr_skip=14
> > > [ 332.195162] nr_found=2 nr_skip=2
> > > [ 332.195274] nr_found=14 nr_skip=14
> > > [ 352.315273] nr_found=14 nr_skip=14
> > > [ 372.673575] nr_found=14 nr_skip=14
> > > [ 397.115463] nr_found=14 nr_skip=14
> > > [ 403.391694] cc1 used greatest stack depth: 3184 bytes left
> > > [ 404.761194] cc1 used greatest stack depth: 2640 bytes left
> > > [ 417.306510] nr_found=14 nr_skip=14
> > > [ 440.198051] nr_found=14 nr_skip=14
> > >
> > > I also used :
> > >
> > > - if (unlikely(!ret && nr_found))
> > > + if (unlikely(!ret && nr_found > nr_skip))
> > > goto restart;
> > nr_found > nr_skip is better
> >
> > > It seems to fix the bug. I suspect it also aborts
> > > invalidate_mapping_pages() if we skip 14 pages, but existing comment
> > > states its OK :
> > >
> > > /*
> > > * Note: this function may get called on a shmem/tmpfs mapping:
> > > * pagevec_lookup() might then return 0 prematurely (because it
> > > * got a gangful of swap entries); but it's hardly worth worrying
> > > * about - it can rarely have anything to free from such a mapping
> > > * (most pages are dirty), and already skips over any difficulties.
> > > */
> > that might be a problem, let Hugh answer if it is.
>
> Thanks to you all for suffering, reporting and investigating this.
> Yes, in 3.1-rc I have converted an extremely rare try-again-once
> into a too-easily stumbled-upon endless loop.
>
> Would it be a problem to give up early on a shmem/tmpfs mapping in
> invalidate_mapping_pages()? No, not really: it's rare for it to find
> anything it can throw away from tmpfs, because it cannot recognize the
> clean swapcache pages (getting it to work on those would be nice, and
> something I did look into once, but it's not a job for today), and
> entirely clean pages (readonly mmap'ed zeroes never touched) are uncommon.
>
> However, I did independently run across scan_mapping_unevictable_pages()
> a few days ago: that uses pagevec_lookup() on shmem when doing SHM_UNLOCK,
> and although the normal case would be that everything then is in memory,
> I think it's not impossible for some to be swapped out (already swapped
> out at SHM_LOCK time, and not touched since), which should not stop it
> from doing its work on unswapped pages beyond.
>
> My preferred patch is below: but it does add a cond_resched() into
> find_get_pages(), which is really below the level at which we usually
> do cond_resched(). All callers appear fine with it, and in practice
> it would be very^14 rare on anything other than shmem/tmpfs: so this
> being rc6 I'm reluctant to make matters worse with a might_sleep().
>
> But I'm not signing this off yet, because I'm still mystified by the
> several reports of seemingly the same problem on 3.0.1 and 3.0.2,
> which I fear the patch below (even if adjusted to apply) will do
> nothing to help - there are no swap entries in radix_tree in 3.0.
>
> My suspicion is that there's some path by which a page gets trapped
> in the radix_tree with page count 0. While it's easy to imagine that
> THP's use of compaction and compaction's use of migration could have
> made a bug there more common, I do not see it.
>
> I'd like to think about that a little more before finalizing the
> patch below - does it work, and does it look acceptable so far?
> Of course, the mods to truncate.c and vmscan.c are not essential
> parts of this fix, just things to tidy up while on the subject.
> Right now I must attend to some other stuff, will return tomorrow.
>
> Hugh
>

Hello Hugh

I am going to test this ASAP, but have one question below :

> ---
>
> mm/filemap.c | 14 ++++++++++----
> mm/truncate.c | 8 --------
> mm/vmscan.c | 2 +-
> 3 files changed, 11 insertions(+), 13 deletions(-)
>
> --- 3.1-rc6/mm/filemap.c 2011-08-07 23:44:41.231928061 -0700
> +++ linux/mm/filemap.c 2011-09-14 12:24:26.431242155 -0700
> @@ -829,8 +829,8 @@ unsigned find_get_pages(struct address_s
> unsigned int ret;
> unsigned int nr_found;
>
> - rcu_read_lock();
> restart:
> + rcu_read_lock();
> nr_found = radix_tree_gang_lookup_slot(&mapping->page_tree,
> (void ***)pages, NULL, start, nr_pages);
> ret = 0;
> @@ -849,12 +849,15 @@ repeat:
> * to root: none yet gotten, safe to restart.
> */
> WARN_ON(start | i);
> + rcu_read_unlock();
> goto restart;
> }
> /*
> * Otherwise, shmem/tmpfs must be storing a swap entry
> * here as an exceptional entry: so skip over it -
> - * we only reach this from invalidate_mapping_pages().
> + * we only reach this from invalidate_mapping_pages(),
> + * or SHM_UNLOCK's scan_mapping_unevictable_pages() -
> + * in each case it's correct to skip a swapped entry.
> */
> continue;
> }
> @@ -871,14 +874,17 @@ repeat:
> pages[ret] = page;
> ret++;
> }
> + rcu_read_unlock();
>
> /*
> * If all entries were removed before we could secure them,
> * try again, because callers stop trying once 0 is returned.
> */
> - if (unlikely(!ret && nr_found))
> + if (unlikely(!ret && nr_found)) {
> + cond_resched();
> + start += nr_found;

Isnt it possible to go out of initial window ?
start could be greater than 'end' ?

invalidate_mapping_pages()

does some capping (end - index)


> pagevec_init(&pvec, 0);
> while (index <= end && pagevec_lookup(&pvec, mapping, index,
> min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/