Re: [Question]: major faults are still triggered after mlockall when numa balancing

From: Yin, Fengwei
Date: Tue Nov 14 2023 - 06:24:12 EST




On 11/13/2023 10:02 AM, Huang, Ying wrote:
>>> There are other places in the kernel where the PTE is cleared, for
>>> example, move_ptes() in mremap.c. IIUC, we need to audit all them.
>>>
>>> Another possible solution is to check PTE again with PTL held before
>>> reading in file data. This will increase the overhead of major fault
>>> path. Is it acceptable?
>> What if we check the PTE without page table lock acquired?
> The PTE is zeroed temporarily only with PTL held. So, if we acquire the
> PTL in filemap_fault() and check the PTE, the PTE which is zeroed in
> do_numa_page() will be non-zero now. So we can avoid the major fault.
Yes.

>
> But, if we don't acquire the PTL, the PTE may still be zero.
For do_numa_page()/change_pte_range(), it does very limit thing during
PTE is cleared. Considering the code path of do_read_fault(), it's likely
the PTE is none-zero.

My concern to acquiring lock is that it brings extra PTL lock acquire/release
for other more common cases.

Regards
Yin, Fengwei

>
> --
> Best Regards,
> Huang, Ying