Re: [RFC PATCH 4/4] sched/numa: Don't update mm->numa_next_scan from fault path

From: Bharata B Rao
Date: Tue Oct 05 2021 - 05:10:35 EST



On 10/5/2021 1:53 PM, Mel Gorman wrote:
> On Mon, Oct 04, 2021 at 04:27:06PM +0530, Bharata B Rao wrote:
>> p->numa_scan_period is typically scaled up or down from
>> the fault path and mm->numa_next_scan is updated during
>> scanning from the task_work context using cmpxchg.
>>
>> However there is one case where the scan period is increased
>> in the fault path, but mm->numa_next_scan
>>
>> - is immediately updated and
>> - updated without using cmpxchg
>>
>> Both of the above don't seem intended and hence remove
>> the updation of mm->numa_next_scan from the fault path
>> Updation should happen from task_work context subsequently.
>>
>> Signed-off-by: Bharata B Rao <bharata@xxxxxxx>
>
> I believe the update was intended because it aims to reduce scanning
> when the task is either completely idle or activity is in memory ranges
> that are not influenced by numab. What is the user-visible impact you
> observe?

I haven't measured, it just appeared unintended when glancing at
the code, but now you have clarified it.

>
> My expectation is that in some cases this will increase the number of
> PTE updates and migrations. It may even be a performance gain for some
> workloads if it increases locality but in cases where locality is poor
> (e.g. heavily shared regions or cross-node migrations), there will be a
> loss due to increased numab activity.

Thanks, I will check if I can measure and verify the above.

>
> Updating via cmpxchg would be ok to avoid potential collisions between
> threads updating a shared mm.

Ok, may be I could just resend with changing the scan period update
to use cmpxchg.

I also notice that in this case of scan period update, we just return
without resetting the p->numa_faults_locality[]. Do you think if
skipping the reset doesn't matter in this case?

Regards,
Bharata.