Re: [PATCH 3/4 net-next] net: mana: add a function to spread IRQs per CPUs

From: Souradeep Chakrabarti
Date: Wed Jan 10 2024 - 04:08:37 EST


On Tue, Jan 09, 2024 at 08:20:31PM +0000, Haiyang Zhang wrote:
>
>
> > -----Original Message-----
> > From: Michael Kelley <mhklinux@xxxxxxxxxxx>
> > Sent: Tuesday, January 9, 2024 2:23 PM
> > To: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx>; KY Srinivasan
> > <kys@xxxxxxxxxxxxx>; Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>;
> > wei.liu@xxxxxxxxxx; Dexuan Cui <decui@xxxxxxxxxxxxx>;
> > davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx; kuba@xxxxxxxxxx;
> > pabeni@xxxxxxxxxx; Long Li <longli@xxxxxxxxxxxxx>; yury.norov@xxxxxxxxx;
> > leon@xxxxxxxxxx; cai.huoqing@xxxxxxxxx; ssengar@xxxxxxxxxxxxxxxxxxx;
> > vkuznets@xxxxxxxxxx; tglx@xxxxxxxxxxxxx; linux-hyperv@xxxxxxxxxxxxxxx;
> > netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-
> > rdma@xxxxxxxxxxxxxxx
> > Cc: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxx>; Paul Rosswurm
> > <paulros@xxxxxxxxxxxxx>
> > Subject: RE: [PATCH 3/4 net-next] net: mana: add a function to spread IRQs per
> > CPUs
> >
> > [Some people who received this message don't often get email from
> > mhklinux@xxxxxxxxxxx. Learn why this is important at
> > https://aka.ms/LearnAboutSenderIdentification ]
> >
> > From: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx> Sent:
> > Tuesday, January 9, 2024 2:51 AM
> > >
> > > From: Yury Norov <yury.norov@xxxxxxxxx>
> > >
> > > Souradeep investigated that the driver performs faster if IRQs are
> > > spread on CPUs with the following heuristics:
> > >
> > > 1. No more than one IRQ per CPU, if possible;
> > > 2. NUMA locality is the second priority;
> > > 3. Sibling dislocality is the last priority.
> > >
> > > Let's consider this topology:
> > >
> > > Node 0 1
> > > Core 0 1 2 3
> > > CPU 0 1 2 3 4 5 6 7
> > >
> > > The most performant IRQ distribution based on the above topology
> > > and heuristics may look like this:
> > >
> > > IRQ Nodes Cores CPUs
> > > 0 1 0 0-1
> > > 1 1 1 2-3
> > > 2 1 0 0-1
> > > 3 1 1 2-3
> > > 4 2 2 4-5
> > > 5 2 3 6-7
> > > 6 2 2 4-5
> > > 7 2 3 6-7
> >
> > I didn't pay attention to the detailed discussion of this issue
> > over the past 2 to 3 weeks during the holidays in the U.S., but
> > the above doesn't align with the original problem as I understood
> > it. I thought the original problem was to avoid putting IRQs on
> > both hyper-threads in the same core, and that the perf
> > improvements are based on that configuration. At least that's
> > what the commit message for Patch 4/4 in this series says.
> >
> > The above chart results in 8 IRQs being assigned to the 8 CPUs,
> > probably with 1 IRQ per CPU. At least on x86, if the affinity
> > mask for an IRQ contains multiple CPUs, matrix_find_best_cpu()
> > should balance the IRQ assignments between the CPUs in the mask.
> > So the original problem is still present because both hyper-threads
> > in a core are likely to have an IRQ assigned.
> >
> > Of course, this example has 8 IRQs and 8 CPUs, so assigning an
> > IRQ to every hyper-thread may be the only choice. If that's the
> > case, maybe this just isn't a good example to illustrate the
> > original problem and solution. But even with a better example
> > where the # of IRQs is <= half the # of CPUs in a NUMA node,
> > I don't think the code below accomplishes the original intent.
> >
> > Maybe I've missed something along the way in getting to this
> > version of the patch. Please feel free to set me straight. :-)
> >
> > Michael
>
> I have the same question as Michael. Also, I'm asking Souradeep
> in another channel: So, the algorithm still uses up all current
> NUMA node before moving on to the next NUMA node, right?
>
> Except each IRQ is affinitized to 2 CPUs.
> For example, a system with 2 IRQs:
> IRQ Nodes Cores CPUs
> 0 1 0 0-1
> 1 1 1 2-3
>
> Is this performing better than the algorithm in earlier patches? like below:
> IRQ Nodes Cores CPUs
> 0 1 0 0
> 1 1 1 2
>
The details for this approach has been shared by Yury later in this thread.
The main intention with this approach is kernel may pick any
sibling for the IRQ.
> Thanks,
> - Haiyang