Re: [RFC PATCH 2/2] NUMA balancing: avoid to migrate task to CPU-less node

From: Huang, Ying
Date: Tue Feb 08 2022 - 03:21:03 EST


Srikar Dronamraju <srikar@xxxxxxxxxxxxxxxxxx> writes:

> * Huang, Ying <ying.huang@xxxxxxxxx> [2022-01-28 15:51:36]:
>
>> Srikar Dronamraju <srikar@xxxxxxxxxxxxxxxxxx> writes:
>>
>> > * Huang Ying <ying.huang@xxxxxxxxx> [2022-01-28 10:38:42]:
>> >
>> This sounds reasonable. How about the following solution? If a
>> CPU-less node is selected as migration target, we select a nearest node
>> with CPU instead? That is, something like the below patch.
>>
>> Best Regards,
>> Huang, Ying
>>
>> ------------------------------8<---------------------------------
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 5146163bfabb..52d926d8cbdb 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -2401,6 +2401,23 @@ static void task_numa_placement(struct task_struct *p)
>> }
>> }
>>
>> + /* Cannot migrate task to CPU-less node */
>> + if (!node_state(max_nid, N_CPU)) {
>> + int near_nid = max_nid;
>> + int distance, near_distance = INT_MAX;
>> +
>> + for_each_online_node(nid) {
>> + if (!node_state(nid, N_CPU))
>> + continue;
>> + distance = node_distance(max_nid, nid);
>> + if (distance < near_distance) {
>> + near_nid = nid;
>> + near_distance = distance;
>> + }
>> + }
>> + max_nid = near_nid;
>> + }
>> +
>
>
> This looks good. but should we move this into preferred_group_nid()?

Yes. We need to take care of preferred_group_nid() too. Will do that
in the next version.

> i.e should we care for !ng case, since those would mean only private faults.

IMO we need to care for !ng case. If the fault number of the CPU-less
node is the max, we need to migrate to the nearest node with CPU
instead.

Best Regards,
Huang, Ying

>> if (ng) {
>> numa_group_count_active_nodes(ng);
>> spin_unlock_irq(group_lock);