Re: [RFC PATCH 4/4] sched/numa: Don't update mm->numa_next_scan from fault path

From: Mel Gorman
Date: Tue Oct 05 2021 - 04:23:43 EST


On Mon, Oct 04, 2021 at 04:27:06PM +0530, Bharata B Rao wrote:
> p->numa_scan_period is typically scaled up or down from
> the fault path and mm->numa_next_scan is updated during
> scanning from the task_work context using cmpxchg.
>
> However there is one case where the scan period is increased
> in the fault path, but mm->numa_next_scan
>
> - is immediately updated and
> - updated without using cmpxchg
>
> Both of the above don't seem intended and hence remove
> the updation of mm->numa_next_scan from the fault path
> Updation should happen from task_work context subsequently.
>
> Signed-off-by: Bharata B Rao <bharata@xxxxxxx>

I believe the update was intended because it aims to reduce scanning
when the task is either completely idle or activity is in memory ranges
that are not influenced by numab. What is the user-visible impact you
observe?

My expectation is that in some cases this will increase the number of
PTE updates and migrations. It may even be a performance gain for some
workloads if it increases locality but in cases where locality is poor
(e.g. heavily shared regions or cross-node migrations), there will be a
loss due to increased numab activity.

Updating via cmpxchg would be ok to avoid potential collisions between
threads updating a shared mm.

--
Mel Gorman
SUSE Labs