Re: This is the fourth time I’ve tried to find what led to the regression of outgoing network speed and each time I find the merge commit 8c94ccc7cd691472461448f98e2372c75849406c

From: Mathias Nyman
Date: Wed Feb 21 2024 - 08:43:26 EST


On 21.2.2024 1.43, Randy Dunlap wrote:


On 2/20/24 15:41, Randy Dunlap wrote:
{+ tglx]

(this time for real)


On 2/20/24 15:19, Mikhail Gavrilov wrote:
On Mon, Feb 19, 2024 at 2:41 PM Mikhail Gavrilov
<mikhail.v.gavrilov@xxxxxxxxx> wrote:

I installed irqbalance daemon and nothing changed.
So who is responsible for irq balancing?

Sorry for the noise. Can anyone give me an answer?
Who is responsible for distributing interrupts in Linux?
I spotted network performance regression and it turned out, this was
due to the network card getting other interrupt. It is a side effect
of commit 57e153dfd0e7a080373fe5853c5609443d97fa5a.

That's a merge commit (AFAIK, maybe not so much). The commit in mainline is:

commit f977f4c9301c
Author: Niklas Neronin <niklas.neronin@xxxxxxxxxxxxxxx>
Date: Fri Dec 1 17:06:40 2023 +0200

xhci: add handler for only one interrupt line

Installing irqbalance daemon did not help. Maybe someone experienced
such a problem?


Thomas, would you look at this, please?

A network device and xhci (USB) driver are now sharing interrupts.
This causes a large performance decrease for the networking device.

Short recap:

xhci (USB) and network device didn't share interrupts, or even interrupt the
same CPU in either good or bad case.

A change in how many interrupts xhci driver requests changed which CPU
the network device interrupts.

In the bad case Mikhail Gavrilovs network device was interrupting CPU0
together with:
- IR-IO-APIC 2-edge timer
- IR-PCI-MSIX-0000:07:00.0 1-edge nvme1q1

In the good case network device was interrupting CPU27 together with:
- IR-PCI-MSIX-0000:04:00.0 27-edge nvme0q27
- IR-PCI-MSIX-0000:07:00.0 28-edge nvme1q28

Manually moving network device irq 87 from CPU0 to CPU23 helped.
(echo 800000 > /proc/irq/87/smp_affinity)

Thanks
-Mathias