RE: [PATCH 1/1] mm/migrate: fix list corruption in migration of non-LRU movable pages

From: sunqiuyang
Date: Tue Sep 03 2019 - 22:18:45 EST


The isolate path of non-lru movable pages:

isolate_migratepages_block
isolate_movable_page
trylock_page
// if PageIsolated, goto out_no_isolated
a_ops->isolate_page
__SetPageIsolated
unlock_page
list_add(&page->lru, &cc->migratepages)

The migration path:

unmap_and_move
__unmap_and_move
lock_page
move_to_new_page
a_ops->migratepage
__ClearPageIsolated
unlock_page
/* here, the page could be isolated again by another thread, and added into another cc->migratepages,
since PG_Isolated has been cleared, and not protected by page_lock */
list_del(&page->lru)

Suppose thread A isolates three pages in the order p1, p2, p3, A's cc->migratepages will be like
head_A - p3 - p2 - p1
After p2 is migrated (but before list_del), it is isolated by another thread B. Then list_del will delete p2
from the cc->migratepages of B (instead of A). When A continues to migrate and delete p1, it will find:
p1->prev == p2
p2->next == LIST_POISON1.

So we will end up with a bug like
"list_del corruption. prev->next should be ffffffbf0a1eb8e0, but was dead000000000100"
(see __list_del_entry_valid).


________________________________________
From: Michal Hocko [mhocko@xxxxxxxxxx]
Sent: Tuesday, September 03, 2019 21:17
To: sunqiuyang
Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx
Subject: Re: [PATCH 1/1] mm/migrate: fix list corruption in migration of non-LRU movable pages

On Tue 03-09-19 16:27:46, sunqiuyang wrote:
> From: Qiuyang Sun <sunqiuyang@xxxxxxxxxx>
>
> Currently, after a page is migrated, it
> 1) has its PG_isolated flag cleared in move_to_new_page(), and
> 2) is deleted from its LRU list (cc->migratepages) in unmap_and_move().
> However, between steps 1) and 2), the page could be isolated by another
> thread in isolate_movable_page(), and added to another LRU list, leading
> to list_del corruption later.

Care to explain the race? Both paths use page_lock AFAICS
>
> This patch fixes the bug by moving list_del into the critical section
> protected by lock_page(), so that a page will not be isolated again before
> it has been deleted from its LRU list.
>
> Signed-off-by: Qiuyang Sun <sunqiuyang@xxxxxxxxxx>
> ---
> mm/migrate.c | 11 +++--------
> 1 file changed, 3 insertions(+), 8 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index a42858d..c58a606 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1124,6 +1124,8 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
> /* Drop an anon_vma reference if we took one */
> if (anon_vma)
> put_anon_vma(anon_vma);
> + if (rc != -EAGAIN)
> + list_del(&page->lru);
> unlock_page(page);
> out:
> /*
> @@ -1190,6 +1192,7 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
> put_new_page(newpage, private);
> else
> put_page(newpage);
> + list_del(&page->lru);
> goto out;
> }
>
> @@ -1200,14 +1203,6 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
> out:
> if (rc != -EAGAIN) {
> /*
> - * A page that has been migrated has all references
> - * removed and will be freed. A page that has not been
> - * migrated will have kepts its references and be
> - * restored.
> - */
> - list_del(&page->lru);
> -
> - /*
> * Compaction can migrate also non-LRU pages which are
> * not accounted to NR_ISOLATED_*. They can be recognized
> * as __PageMovable
> --
> 1.8.3.1

--
Michal Hocko
SUSE Labs