Re: [PATCH -next] mm: hwpoison: support recovery from HugePage copy-on-write faults

From: Andrew Morton
Date: Wed Apr 12 2023 - 17:57:24 EST


On Wed, 12 Apr 2023 11:13:50 -0700 Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:

> On 04/11/23 17:27, Liu Shixin wrote:
> > Patch a873dfe1032a ("mm, hwpoison: try to recover from copy-on write faults")
> > introduced a new copy_user_highpage_mc() function, and fix the kernel crash
> > when the kernel is copying a normal page as the result of a copy-on-write
> > fault and runs into an uncorrectable error. But it doesn't work for HugeTLB.
>
> Andrew asked about user-visible effects. Perhaps, a better way of
> stating this in the commit message might be:
>
> Commit a873dfe1032a ("mm, hwpoison: try to recover from copy-on write
> faults") introduced the routine copy_user_highpage_mc() to gracefully
> handle copying of user pages with uncorrectable errors. Previously,
> such copies would result in a kernel crash. hugetlb has separate code
> paths for copy-on-write and does not benefit from the changes made in
> commit a873dfe1032a.
>
> Modify hugetlb copy-on-write code paths to use copy_mc_user_highpage()
> so that they can also gracefully handle uncorrectable errors in user
> pages. This involves changing the hugetlb specific routine
> ?copy_user_folio()? from type void to int so that it can return an error.
> Modify the hugetlb userfaultfd code in the same way so that it can return
> -EHWPOISON if it encounters an uncorrectable error.

Thanks, but... what are the runtime effects? What does hugetlb
presently do when encountering these uncorrectable error?