Re: [PATCH v1] sched/fair: fix inconsistency in update_task_scan_period

From: zhaomengmeng
Date: Tue Apr 25 2023 - 22:05:21 EST


On 2023/4/25 18:24, Peter Zijlstra wrote:
On Tue, Apr 25, 2023 at 06:02:04AM -0400, zhaomzhao@xxxxxxx wrote:
From: Zhao Mengmeng <zhaomengmeng@xxxxxxxxxx>

During calculate numa_scan_period diff, the actual code
and the comment are inconsistent. The comment says it is
using shared faults ratio, but code uses private faults
ratio. This patch fixes it.

So for some reason you think the comment is correct ? You also don't
discuss the performance changes caused by changing the code.

Sorry for the repeated mail and thanks for your comments.

First, I'd like to make some additions. When I was studying the numa-balancing code and found this inconsistency, I feel like the same as you: the comment is out-dated and not matching the code , so I was intended to change the comment. But after some git digging work, I change my mind.

Function update_task_scan_period was added in "04bb2f9475054: sched/numa: Adjust scan rate in task_numa_placement", which set up the basis. Though the details logic inside the function may change a lot, this function's outer-side effect keep the same.
"""
Increase the scan period (slow down scanning) if the majority of our memory is
already on our local node, or if the majority of the page accesses are shared
with other processes
"""
Later, in commit "37ec97deb3a8c: sched/numa: Slow down scan rate if shared faults dominate", the author claims that
"""
However, with the current code, all a high ratio of shared accesses
does is slow down the rate at which scanning is made faster.

This patch changes things so either lots of shared accesses or
lots of local accesses will slow down scanning, and numa scanning
is sped up only when there are lots of private faults on remote
memory pages.
"""
But the actual code logic doesn't reflect his intention in commit log, which resulting the latest code used by kernel. Based on the this, I change the code and make this patch. That is *the reason I think the comment is correct*.

Second, the performance. I'm sorry that it is the shortage of my work. Which benchmark shall I use, like autonuma-benchmark? I'm lack of physical server with multi numa node for testing, is it enough to offer a virtual machine testing result? Besides, this patch just meant to achieve the initial design, not for optimization, is a performance result necessary?

Best regards.