Re: [RFC PATCH 0/2] Adjust CFS loadbalance to adapt QEMU CPU topology.

From: Peter Zijlstra
Date: Fri Jul 21 2023 - 05:15:44 EST


On Fri, Jul 21, 2023 at 10:33:44AM +0200, Vincent Guittot wrote:
> On Fri, 21 Jul 2023 at 04:59, Kenan.Liu <Kenan.Liu@xxxxxxxxxxxxxxxxx> wrote:

> > The SMT topology in qemu native x86 CPU model is (0,1),…,(n,n+1),…,
> > but nomarlly seen SMT topo in physical machine is like (0,n),(1,n+1),…,
> > n means the total core number of the machine.
> >
> > The imbalance happens when the number of runnable threads is less
> > than the number of hyperthreads, select_idle_core() would be called
> > to decide which cpu be selected to run the waken-up task.
> >
> > select_idle_core() will return the checked cpu number if the whole
> > core is idle. On the contrary, if any one HT of the core is busy,
> > select_idle_core() would clear the whole core out from cpumask and
> > check the next core.
> >
> > select_idle_core():
> > …
> > if (idle)
> > return core;
> >
> > cpumask_andnot(cpus, cpus, cpu_smt_mask(core));
> > return -1;
> >
> > In this manner, except the very beginning of for_each_cpu_wrap() loop,
> > HT with even ID number is always be checked at first, and be returned
> > to the caller if the whole core is idle, so the odd indexed HT almost
> > has no chance to be selected.
> >
> > select_idle_cpu():
> > …
> > for_each_cpu_wrap(cpu, cpus, target + 1) {
> > if (has_idle_core) {
> > i = select_idle_core(p, cpu, cpus, &idle_cpu);
> >
> > And this will NOT happen when the SMT topo is (0,n),(1,n+1),…, because
> > when the loop starts from the bottom half of SMT number, HTs with larger
> > number will be checked first, when it starts from the top half, their
> > siblings with smaller number take the first place of inner core searching.
>
> But why is it a problem ? Your system is almost idle and 1 HT per core
> is used. Who cares to select evenly one HT or the other as long as we
> select an idle core in priority ?

Right, why is this a problem? Hyperthreads are supposed to be symmetric,
it doesn't matter which of the two are active, the important thing is to
only have one active if we can.

(Unlike Power7, they have asymmetric SMT)