RE: [RFC PATCH] mm, hwpoison: Recover from copy-on-write machine checks

From: Luck, Tony
Date: Tue Oct 18 2022 - 13:52:22 EST


>> + * -1 = copy failed due to poison in source page

> Simply calling "poison" might cause confusion with page poisoning feature,
> so "hwpoison" might be better. But I know that "poison" is commonly used
> under arch/x86, and it's not clear to me what to do with this terminology.
> Maybe using -EHWPOISON instead of -1 might be helpful to the distinction.

Agreed. Using -EHWPOISON return is clearer here.

>> - if (!__wp_page_copy_user(new_page, old_page, vmf)) {
>> + ret = __wp_page_copy_user(new_page, old_page, vmf);
>> + if (ret == -1) {
>> + put_page(new_page);
>
> Maybe I miss something, but don't you have to care about refcount of
> old_page in this branch (as done in "ret == 0" branch)?

You didn't miss anything. But I did. More needs to be done with old_page
(it is where the poison is). I got "lucky" just ignoring/forgetting about it in
my patch ... the system just happened to recover, but I think the poison
may not have been handled for the parent process. that still has the page
mapped.. Need to think about this more.

Thanks for the review.

-Tony