Re: [PATCH v4 1/2] mm,hwpoison: fix race with compound page allocation

From: Oscar Salvador
Date: Mon May 17 2021 - 06:12:58 EST


On Mon, May 17, 2021 at 01:54:00PM +0900, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>
>
> When hugetlb page fault (under overcommitting situation) and
> memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race:
>
> CPU0: CPU1:
>
> gather_surplus_pages()
> page = alloc_surplus_huge_page()
> memory_failure_hugetlb()
> get_hwpoison_page(page)
> __get_hwpoison_page(page)
> get_page_unless_zero(page)
> zero = put_page_testzero(page)
> VM_BUG_ON_PAGE(!zero, page)
> enqueue_huge_page(h, page)
> put_page(page)
>
> __get_hwpoison_page() only checks page refcount before taking additional
^^ the? ^^ an
> one for memory error handling, which is wrong because there's a time
> window where compound pages have non-zero refcount during initialization.
>
> So makes __get_hwpoison_page() check page status a bit more for a few
^^ make
> types of compound pages. PageSlab() check is added because otherwise
> "non anonymous thp" path is wrongly chosen.

This is no longer true with this patch, is it? What happened here?

> static int __get_hwpoison_page(struct page *page)
> {
> struct page *head = compound_head(page);
> + int ret = 0;
> +
> +#ifdef CONFIG_HUGETLB_PAGE
> + spin_lock(&hugetlb_lock);
> + if (PageHuge(head) && (HPageFreed(head) || HPageMigratable(head)))
> + ret = get_page_unless_zero(head);
> + spin_unlock(&hugetlb_lock);
> + if (ret > 0)
> + return ret;
> +#endif

I am kind of fine with this, but I wonder whether it makes sense to hide this
details into helper (with an empty stub for non-hugetlb pages)?

> if (!PageHuge(head) && PageTransHuge(head)) {
This !PageHuge could go?


--
Oscar Salvador
SUSE L3