Re: [PATCH v2 4/8] mm/memory-failure.c: fix race with changing page more robustly

From: HORIGUCHI NAOYA(堀口 直也)
Date: Thu Feb 17 2022 - 20:13:57 EST


On Wed, Feb 16, 2022 at 05:14:27PM +0800, Miaohe Lin wrote:
> We're only intended to deal with the non-Compound page after we split thp
> in memory_failure. However, the page could have changed compound pages due
> to race window. If this happens, we could try again to hopefully handle the
> page next round. Also remove unneeded orig_head. It's always equal to the
> hpage. So we can use hpage directly and remove this redundant one.
>
> Signed-off-by: Miaohe Lin <linmiaohe@xxxxxxxxxx>
> ---
> mm/memory-failure.c | 20 ++++++++++++--------
> 1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 7e205d91b2d7..d66f642888be 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1690,7 +1690,6 @@ int memory_failure(unsigned long pfn, int flags)
> {
> struct page *p;
> struct page *hpage;
> - struct page *orig_head;
> struct dev_pagemap *pgmap;
> int res = 0;
> unsigned long page_flags;
> @@ -1736,7 +1735,7 @@ int memory_failure(unsigned long pfn, int flags)
> goto unlock_mutex;
> }
>
> - orig_head = hpage = compound_head(p);
> + hpage = compound_head(p);
> num_poisoned_pages_inc();
>
> /*
> @@ -1817,13 +1816,18 @@ int memory_failure(unsigned long pfn, int flags)
> lock_page(p);
>
> /*
> - * The page could have changed compound pages during the locking.
> - * If this happens just bail out.
> + * We're only intended to deal with the non-Compound page here.
> + * However, the page could have changed compound pages due to
> + * race window. If this happens, we could try again to hopefully
> + * handle the page next round.
> */
> - if (PageCompound(p) && compound_head(p) != orig_head) {
> - action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED);
> - res = -EBUSY;
> - goto unlock_page;
> + if (PageCompound(p)) {
> + if (TestClearPageHWPoison(p))
> + num_poisoned_pages_dec();
> + unlock_page(p);
> + put_page(p);
> + flags &= ~MF_COUNT_INCREASED;

Could you limit the retry chance only once by using the local variable
"retry"? It might be very rare to hit the race more than once in a single
error event, but just to be safe from potential infinite loop (that could be
opened by future changes).

Thanks,
Naoya Horiguchi

> + goto try_again;
> }
>
> /*
> --
> 2.23.0