Re: [PATCH] sched/fair: Skip cpus with no sched domain attached during NOHZ idle balance

From: Pierre Gondois
Date: Fri Sep 22 2023 - 11:03:47 EST


Hi Rui,

On 9/20/23 09:24, Zhang, Rui wrote:
Hi, Pierre,

Sorry for the late response. I'm still ramping up on the related code.

No worries, I also need to read the code,


On Thu, 2023-09-14 at 16:53 +0200, Pierre Gondois wrote:


On 9/14/23 11:23, Zhang, Rui wrote:
Hi, Pierre,


Yes right indeed,
This happens when putting a CPU offline (as you mentioned
earlier,
putting a CPU offline clears the CPU in the idle_cpus_mask).

The load balancing related variables

including?

I meant the nohz idle variables in the load balancing, so I was
referring to:
(struct sched_domain_shared).nr_busy_cpus
(struct sched_domain).nohz_idle
nohz.idle_cpus_mask
nohz.nr_cpus
(struct rq).nohz_tick_stopped

IMO, the problem is that, for an isolated CPU,
1. it is not an idle cpu (nohz.idle_cpus_mask should be cleared)
2. it is not a busy cpu (sds->nr_busy_cpus should be decreased)

But current code does not have a third state to describe this, so we
need to either
1. add extra logic, like on_null_domain() checks

I m not sure I understand, do you mean adding on_null_domain() in addition
to the one in the present patch ?

or
2. rely on current logic, but update all related variables correctly,
like you proposed.

But in any case, we should stick with one direction.

If we follow the first one, the original patch should be used, which
IMO is simple and straight forward.
If we follow the later one, we'd better audit and remove the current
on_null_domain() usage at the same time. TBH, I'm not confident enough

Here aswell, I'm not sure I understand whether you are referring to
the on_null_domain() call in the present patch or to the on_null_domain()
calls in fair.c ?

Regards,
Pierre

to make such a change. But if you want to propose something, I'd glad
to test it.

thanks,
rui