Re: [RFC PATCH] sched/fair: Make tg->load_avg per node

From: Chen Yu
Date: Sat Apr 22 2023 - 00:02:28 EST


On 2023-03-27 at 13:39:55 +0800, Aaron Lu wrote:
>
> The reason why only cpus of one node has bigger overhead is: task_group
> is allocated on demand from a slab and whichever cpu happens to do the
> allocation, the allocated tg will be located on that node and accessing
> to tg->load_avg will have a lower cost for cpus on the same node and
> a higer cost for cpus of the remote node.
[...]
> static inline void update_tg_load_avg(struct cfs_rq *cfs_rq)
> {
> long delta = cfs_rq->avg.load_avg - cfs_rq->tg_load_avg_contrib;
> + int node = cpu_to_node(cfs_rq->rq->cpu);
>
> /*
> * No need to update load_avg for root_task_group as it is not used.
> @@ -3616,7 +3617,7 @@ static inline void update_tg_load_avg(struct cfs_rq *cfs_rq)
> return;
>
> if (abs(delta) > cfs_rq->tg_load_avg_contrib / 64) {
> - atomic_long_add(delta, &cfs_rq->tg->load_avg);
> + atomic_long_add(delta, &cfs_rq->tg->node_info[node]->load_avg);
When entered enqueue_entity(cfs_rq, se) -> update_tg_load_avg(cfs_rq)
the cfs_rq->rq->cpu is not necessary the current cpu, so the node returned by
cpu_to_node(cfs_rq->rq->cpu) is not necessary the current node as the current
CPU, would atomic_add still introduce cross-node overhead due to remote access
to cfs_rq->tg->node_info[node]->load_avg in this case, or do I miss something?

thanks,
Chenyu