Re: [syzbot] [mm?] kernel BUG in vma_replace_policy

From: Hugh Dickins
Date: Sat Sep 16 2023 - 00:10:10 EST


On Fri, 15 Sep 2023, Yang Shi wrote:
>
> Hi Suren and Hugh,
>
> Thanks for figuring this out. The mbind behavior is a little bit messy
> and hard to follow. I tried my best to recall all the changes.

Messy and confusing yes; and for every particular behavior, I suspect
that by now there exists some release which has done it that way.

>
> IIUC, mbind did break the vma iteration early in the first place, then
> commit 6f4576e3687b ("mempolicy: apply page table walker on
> queue_pages_range()") changed the behavior (didn't break vma iteration
> early for some cases anymore), but it messed up the return value and
> caused some test cases failure, also violated the manual. The return
> value issue was fixed by commit a7f40cfe3b7a ("mm: mempolicy: make
> mbind() return -EIO when MPOL_MF_STRICT is specified"), this commit
> also restored the oldest behavior (break loop early). But it also
> breaks the loop early when MPOL_MF_MOVE|MOVEALL is set, kernel should
> actually continue the loop to try to migrate all existing pages per
> the manual.

Oh, I missed that aspect in my description: yes, I think that's the
worst of it: MPOL_MF_STRICT alone could break out early because it had
nothing more to learn by going further, but it was simply a mistake for
the MOVEs to break out early (and arguable what MOVE|STRICT should do).

I thought you and I were going to have a debate about this, but we
appear to be in agreement. And I'm not sure whether I agree with
myself about whether do_mbind() should apply the mbind_range()s
when STRICT queue_pages_range() found an unmovable - there are
consistency and regression arguments both ways.

(I've been repeatedly puzzled by your comment in queue_folios_pte_range()
if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
/* MPOL_MF_STRICT must be specified if we get here */
if (!vma_migratable(vma)) {
Does that commment about MPOL_MF_STRICT actually belong inside the
!vma_migratable(vma) block? Sometimes I think so, but sometimes I
remember that the interaction of those flags, and the skipping arranged
by queue_pages_test_walk(), is subtler than I imagine.)

> It sounds like a regression. I will take a look at it.

Thanks! Please do, I don't have the time for it.

>
> So the logic should conceptually look like:
>
> if (MPOL_MF_MOVE|MOVEALL)
> continue;
> if (MPOL_MF_STRICT)
> break;
>
> So it is still possible that some VMAs are not locked if only
> MPOL_MF_STRICT is set.

Conditionally, I'll agree; but it's too easy for me to agree in the
course of trying to get an email out, but on later reflection come
to disagree. STRICT|MOVE behavior arguable.

I think the best I can do is send you (privately) my approx-v5.2 patch
for this (which I never got time to put into even a Google-internal
kernel, though an earlier version was there). In part because I did
more research back then, and its commit message cites several even
older commits than you cite above, which might help to shed more light
on the history (or might just be wrong). And in part because it may
give you some more ideas of what needs doing: notably qp->nr_failed,
because "man 2 migrate_pages" says "On success migrate_pages() returns
the number of pages that could not be moved", but we seem to have
lost sight of that (from which one may conclude that it's not very
important, but I did find it useful when testing); but of course
the usual doubts about the right way to count a page when compound.

I'll check how easily that patch applies to a known base such as
v5.2, maybe trim it to fit better, then send it off to you.

Hugh