From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>
After recent soft-offline rework, error pages can be taken off from
buddy allocator, but the existing unpoison_memory() does not properly
undo the operation. Moreover, due to the recent change on
__get_hwpoison_page(), get_page_unless_zero() is hardly called for
hwpoisoned pages. So __get_hwpoison_page() mostly returns zero (meaning
to fail to grab page refcount) and unpoison just clears PG_hwpoison
without releasing a refcount. That does not lead to a critical issue
like kernel panic, but unpoisoned pages never get back to buddy (leaked
permanently), which is not good.
To fix this, we need to identify "taken off" pages from other types of
hwpoisoned pages. We can't use refcount or page flags for this purpose,
so a pseudo flag is defined by hacking ->private field.
Sometimes hwpoisoned pages can be still in-use, where the refcount should
be more than 1, so we can't unpoison them immediately and need to wait
until the all users release their refcount.
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>
---