Re: [PATCH] mm/hmm: Fix bad subpage pointer in try_to_unmap_one

From: John Hubbard
Date: Mon Jul 15 2019 - 19:34:36 EST


On 7/15/19 3:00 PM, Andrew Morton wrote:
> On Tue, 9 Jul 2019 18:24:57 -0700 Ralph Campbell <rcampbell@xxxxxxxxxx> wrote:
>
>>
>> On 7/9/19 5:28 PM, Andrew Morton wrote:
>>> On Tue, 9 Jul 2019 15:35:56 -0700 Ralph Campbell <rcampbell@xxxxxxxxxx> wrote:
>>>
>>>> When migrating a ZONE device private page from device memory to system
>>>> memory, the subpage pointer is initialized from a swap pte which computes
>>>> an invalid page pointer. A kernel panic results such as:
>>>>
>>>> BUG: unable to handle page fault for address: ffffea1fffffffc8
>>>>
>>>> Initialize subpage correctly before calling page_remove_rmap().
>>>
>>> I think this is
>>>
>>> Fixes: a5430dda8a3a1c ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
>>> Cc: stable
>>>
>>> yes?
>>>
>>
>> Yes. Can you add this or should I send a v2?
>
> I updated the patch. Could we please have some review input?
>
>
> From: Ralph Campbell <rcampbell@xxxxxxxxxx>
> Subject: mm/hmm: fix bad subpage pointer in try_to_unmap_one
>
> When migrating a ZONE device private page from device memory to system
> memory, the subpage pointer is initialized from a swap pte which computes
> an invalid page pointer. A kernel panic results such as:
>
> BUG: unable to handle page fault for address: ffffea1fffffffc8
>
> Initialize subpage correctly before calling page_remove_rmap().
>
> Link: http://lkml.kernel.org/r/20190709223556.28908-1-rcampbell@xxxxxxxxxx
> Fixes: a5430dda8a3a1c ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
> Signed-off-by: Ralph Campbell <rcampbell@xxxxxxxxxx>
> Cc: "JÃrÃme Glisse" <jglisse@xxxxxxxxxx>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx>
> Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> Cc: Jason Gunthorpe <jgg@xxxxxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> ---
>
> mm/rmap.c | 1 +
> 1 file changed, 1 insertion(+)
>
> --- a/mm/rmap.c~mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one
> +++ a/mm/rmap.c
> @@ -1476,6 +1476,7 @@ static bool try_to_unmap_one(struct page
> * No need to invalidate here it will synchronize on
> * against the special swap migration pte.
> */
> + subpage = page;
> goto discard;
> }
>

Hi Ralph and everyone,

While the above prevents a crash, I'm concerned that it is still not
an accurate fix. This fix leads to repeatedly removing the rmap, against the
same struct page, which is odd, and also doesn't directly address the
root cause, which I understand to be: this routine can't handle migrating
the zero page properly--over and back, anyway. (We should also mention more
about how this is triggered, in the commit description.)

I'll take a closer look at possible fixes (I have to step out for a bit) soon,
but any more experienced help is also appreciated here.

thanks,
--
John Hubbard
NVIDIA