Re: EEVDF and NUMA balancing

From: Julia Lawall
Date: Thu Jan 04 2024 - 11:45:20 EST




On Thu, 4 Jan 2024, Vincent Guittot wrote:

> On Fri, 29 Dec 2023 at 16:18, Julia Lawall <julia.lawall@xxxxxxxx> wrote:
> >
> >
> >
> > On Thu, 28 Dec 2023, Julia Lawall wrote:
> >
> > > > > > > > > I'm surprised that you have mainly CPU_NEWLY_IDLE. Do you know the reason ?
> > > > > > > >
> > > > > > > > No. They come from do_idle calling the scheduler. I will look into why
> > > > > > > > this happens so often.
> > > > > > >
> > > > > > > Hmm, the CPU was idle and received a need resched which triggered the
> > > > > > > scheduler but there was nothing to schedule so it goes back to idle
> > > > > > > after running a newly_idle _load_balance.
> > > > > >
> > > > > > I spent quite some time thinking the same until I saw the following code
> > > > > > in do_idle:
> > > > > >
> > > > > > preempt_set_need_resched();
> > > > > >
> > > > > > So I have the impression that do_idle sets need resched itself.
> > > > >
> > > > > But of course that code is only executed if need_resched is true. But I
> > > >
> > > > Yes, that is your root cause. something, most probably in interrupt
> > > > context, wakes up your CPU and expect to wake up a thread
> > > >
> > > > > don't know who would be setting need resched on each clock tick.
> > > >
> > > > that can be a timer, interrupt, ipi, rcu ...
> > > > a trace should give you some hints
> > >
> > > I have the impression that it is the goal of calling nohz_csd_func on each
> > > clock tick that causes the calls to need_resched. If the idle process is
> > > polling, call_function_single_prep_ipi just sets need_resched to get the
>
> Your system is calling the polling mode and not the default
> cpuidle_idle_call() ? This could explain why I don't see such problem
> on my system which doesn't have polling
>
> Are you forcing the use of polling mode ?
> If yes, could you check that this problem disappears without forcing
> polling mode ?

I'll check. I didn't explicitly set anything, but I don't really know
what my configuration file does.

>
> > > idle process to stop polling. But there is no actual task that the idle
> > > process should schedule. The need_resched then prevents the idle process
> > > from stealing, due to the CPU_NEWLY_IDLE flag, contradicting the whole
> > > purpose of calling nohz_csd_func in the first place.
>
> Do I understand correctly that your sequence is :
> CPU A CPU B
> cpu enters idle
> do_idle()
> ...
> loop in cpu_idle_poll
> ...
> kick_ilb on CPU A
> send_call_function_single_ipi
> set_nr_if_polling
> set TIF_NEED_RESCHED
>
> exit polling loop
> exit while (!need_resched())
>
> call nohz_csd_func but
> need_resched is true so it's a nope
>
> pick_next_task_fair
> newidle_balance
> load_balance(CPU_NEWLY_IDLE)

Yes, this looks correct.

thanks,
julia

>
> >
> > Looking in more detail, do_idle contains the following after existing the
> > polling loop:
> >
> > flush_smp_call_function_queue();
> > schedule_idle();
> >
> > flush_smp_call_function_queue() does end up calling nohz_csd_func, but
> > this has no impact, because it first checks that need_resched() is false,
> > whereas it is currently true to cause existing the polling loop. Removing
> > that test causes:
> >
> > raise_softirq_irqoff(SCHED_SOFTIRQ);
> >
> > but that causes the load balancing code to be executed from a ksoftirqd
> > task, which means that there is now no load imbalance.
> >
> > So the only chance to detect an imbalance does seem to be to have the load
> > balance call be executed by the idle task, via schedule_idle(), as is
> > done currently. But that leads to the core being considered to be newly
> > idle.
> >
> > julia
> >
> >
>