Re: [Question]: major faults are still triggered after mlockall when numa balancing

From: Huang, Ying
Date: Tue Nov 14 2023 - 20:48:13 EST


"Yin, Fengwei" <fengwei.yin@xxxxxxxxx> writes:

> On 11/13/2023 10:02 AM, Huang, Ying wrote:
>>>> There are other places in the kernel where the PTE is cleared, for
>>>> example, move_ptes() in mremap.c. IIUC, we need to audit all them.
>>>>
>>>> Another possible solution is to check PTE again with PTL held before
>>>> reading in file data. This will increase the overhead of major fault
>>>> path. Is it acceptable?
>>> What if we check the PTE without page table lock acquired?
>> The PTE is zeroed temporarily only with PTL held. So, if we acquire the
>> PTL in filemap_fault() and check the PTE, the PTE which is zeroed in
>> do_numa_page() will be non-zero now. So we can avoid the major fault.
> Yes.
>
>>
>> But, if we don't acquire the PTL, the PTE may still be zero.
> For do_numa_page()/change_pte_range(), it does very limit thing during
> PTE is cleared. Considering the code path of do_read_fault(), it's likely
> the PTE is none-zero.

It's possible per my understanding, although it doesn't feel good to
depend on some "race" condition.

> My concern to acquiring lock is that it brings extra PTL lock acquire/release
> for other more common cases.

Yes. It will bring some overhead to acquire the PTL.

Anyway, some performance test is needed to compare the solution.

--
Best Regards,
Huang, Ying