Re: [PATCH v2 1/5] mm/hugetlb: fix races when looking up a CONT-PTE size hugetlb page

From: David Hildenbrand
Date: Wed Aug 24 2022 - 07:56:00 EST


On 24.08.22 11:41, Baolin Wang wrote:
>
>
> On 8/24/2022 3:31 PM, David Hildenbrand wrote:
>>>>>>
>>>>>> IMHO, these follow_huge_xxx() functions are arch-specified at first and
>>>>>> were moved into the common hugetlb.c by commit 9e5fc74c3025 ("mm:
>>>>>> hugetlb: Copy general hugetlb code from x86 to mm"), and now there are
>>>>>> still some arch-specified follow_huge_xxx() definition, for example:
>>>>>> ia64: follow_huge_addr
>>>>>> powerpc: follow_huge_pd
>>>>>> s390: follow_huge_pud
>>>>>>
>>>>>> What I mean is that follow_hugetlb_page() is a common and
>>>>>> not-arch-specified function, is it suitable to change it to be
>>>>>> arch-specified?
>>>>>> And thinking more, can we rename follow_hugetlb_page() as
>>>>>> hugetlb_page_faultin() and simplify it to only handle the page faults of
>>>>>> hugetlb like the faultin_page() for normal page? That means we can make
>>>>>> sure only follow_page_mask() can handle hugetlb.
>>>>>>
>>>>
>>>> Something like that might work, but you still have two page table walkers
>>>> for hugetlb. I like David's idea (if I understand it correctly) of
>>>
>>> What I mean is we may change the hugetlb handling like normal page:
>>> 1) use follow_page_mask() to look up a hugetlb firstly.
>>> 2) if can not get the hugetlb, then try to page fault by
>>> hugetlb_page_faultin().
>>> 3) if page fault successed, then retry to find hugetlb by
>>> follow_page_mask().
>>
>> That implies putting more hugetlbfs special code into generic GUP,
>> turning it even more complicated. But of course, it depends on how the
>> end result looks like. My gut feeling was that hugetlb is better handled
>> in follow_hugetlb_page() separately (just like we do with a lot of other
>> page table walkers).
>
> OK, fair enough.
>
>>>
>>> Just a rough thought, and I need more investigation for my idea and
>>> David's idea.
>>>
>>>> using follow_hugetlb_page for both cases. As noted, it will need to be
>>>> taught how to not trigger faults in the follow_page_mask case.
>>>
>>> Anyway, I also agree we need some cleanup, and firstly I think we should
>>> cleanup these arch-specified follow_huge_xxx() on some architectures
>>> which are similar with the common ones. I will look into these.
>>
>> There was a recent discussion on that, e.g.:
>>
>> https://lkml.kernel.org/r/20220818135717.609eef8a@thinkpad
>
> Thanks.
>
>>
>>>
>>> However, considering cleanup may need more investigation and
>>> refactoring, now I prefer to make these bug-fix patches of this patchset
>>> into mainline firstly, which are suitable to backport to old version to
>>> fix potential race issues. Mike and David, how do you think? Could you
>>> help to review these patches? Thanks.
>>
>> Patch #1 certainly add more special code just to handle another hugetlb
>> corner case (CONT pages), and maybe just making it all use
>> follow_hugetlb_page() would be even cleaner and less error prone.
>>
>> I agree that locking is shaky, but I'm not sure if we really want to
>> backport this to stable trees:
>>
>> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
>>
>> "It must fix a real bug that bothers people (not a, “This could be a
>> problem...” type thing)."
>>
>>
>> Do we actually have any instance of this being a real (and not a
>> theoretical) problem? If not, I'd rather clean it all up right away.
>
> I think this is a real problem (not theoretical), and easy to write some
> code to show the issue. For example, suppose thread A is trying to look
> up a CONT-PTE size hugetlb page under the lock, however antoher thread B
> can migrate the CONT-PTE hugetlb page at the same time, which will cause
> thread A to get an incorrect page, if thread A want to do something for
> this incorrect page, error occurs.
>
> Actually we also want to backport these fixes to the distro with old
> kernel versions to make the hugetlb more stable. Otherwise we must hit
> these issues sooner or later if the customers use CONT-PTE/PMD hugetlb.
>
> Anyway, if you and Mike still think these issues are not important
> enough to be fixed in the old versions, I can do the cleanup firstly.
>

[asking myself which follow_page() users actually care about hugetlb,
and why we need this handling in follow_page at all]

Which follow_page() user do we care about here? Primarily mm/migrate.c
only I assume?

--
Thanks,

David / dhildenb