Re: mm/memory-failure: __get_any_page: unknown zero refcount

From: Kallol Biswas
Date: Wed Aug 10 2022 - 16:27:28 EST


Probably we are hitting a race condition. Upstream code has changed.

On Tue, Aug 9, 2022 at 2:28 PM Kallol Biswas <nucleodyne@xxxxxxxxx> wrote:
>
> While running a memory RAS test on a new platform I encountered the
> following on the 5.10.117 kernel.
>
> __get_any_page: 0x77df: unknown zero refcount page type 7fffe000000000
>
> The address 0x77df000 is in a system ram area:
> 00100000-5c9c0017 : System RAM
>
> The page is not a huge page, not on the free buddy list and not in use.
>
> __get_any_page()
> ..................
> if (!get_hwpoison_page(p)) {
> if (PageHuge(p)) {
> pr_info("%s: %#lx free huge page\n", __func__, pfn);
> ret = 0;
> } else if (is_free_buddy_page(p)) {
> pr_info("%s: %#lx free buddy page\n", __func__, pfn);
> ret = 0;
> } else if (page_count(p)) {
> /* raced with allocation */
> ret = -EBUSY;
> } else {
> pr_info("%s: %#lx: unknown zero refcount page type %lx\n",
> __func__, pfn, p->flags);
>
>
> Sparse mem configs are set:
> cat /boot/config-5.10.117-2.el7.nutanix.20220304.1002776.x86_64 | grep -i sparse
> CONFIG_SPARSE_IRQ=y
> CONFIG_ARCH_SPARSEMEM_ENABLE=y
> CONFIG_ARCH_SPARSEMEM_DEFAULT=y
> CONFIG_SPARSEMEM_MANUAL=y
> CONFIG_SPARSEMEM=y
> CONFIG_SPARSEMEM_EXTREME=y
> CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
> CONFIG_SPARSEMEM_VMEMMAP=y
> CONFIG_MEMORY_HOTPLUG_SPARSE=y
>
> Can someone help understand why we have such a page in the system?
> What the purpose is.
>
> Thank you,
> Kallol