Re: [PATCH v3 0/2] irqchip/gic-v3-its: Balance LPI affinity across CPUs

From: John Garry
Date: Thu Mar 19 2020 - 08:32:09 EST

Next message: Vlastimil Babka: "Re: [PATCH v2 2/2] mm/page_alloc: integrate classzone_idx and high_zoneidx"
Previous message: Vlastimil Babka: "Re: [PATCH v2 1/2] mm/page_alloc: use ac->high_zoneidx for classzone_idx"
In reply to: John Garry: "Re: [PATCH v3 2/2] irqchip/gic-v3-its: Balance initial LPI affinity across CPUs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 16/03/2020 11:54, Marc Zyngier wrote:

When mapping a LPI, the ITS driver picks the first possible
affinity, which is in most cases CPU0, assuming that if
that's not suitable, someone will come and set the affinity
to something more interesting.

It apparently isn't the case, and people complain of poor
performance when many interrupts are glued to the same CPU.
So let's place the interrupts by finding the "least loaded"
CPU (that is, the one that has the fewer LPIs mapped to it).
So called 'managed' interrupts are an interesting case where
the affinity is actually dictated by the kernel itself, and
we should honor this.

* From v2:
- Split accounting from CPU selection
- Track managed and unmanaged interrupts separately

Marc Zyngier (2):
irqchip/gic-v3-its: Track LPI distribution on a per CPU basis
irqchip/gic-v3-its: Balance initial LPI affinity across CPUs

drivers/irqchip/irq-gic-v3-its.c | 153 +++++++++++++++++++++++++------
1 file changed, 127 insertions(+), 26 deletions(-)

Hi Marc,

Initial results look good. We have 3x NVMe drives now, as opposed to 2x previously, which is better for this test.

Before: ~1.3M IOPs fio read
After: ~1.8M IOPs fio read

So a ~50% gain in throughput.

We also did try NVMe with nvme.use_threaded_interrupts=1. As you may remember, the NVMe interrupt handling can cause lockups, as they handle all completions in interrupt context by default.

Before: ~1.2M IOPs fio read
After: ~1.2M IOPs fio read

So they were about the same. I would have hoped for an improvement here, considering before we would have all the per-queue threaded handlers running on the single CPU handling the hard irq.

But we will retest all this tomorrow, so please consider these provisional for now.

Thanks to Luo Jiaxing for testing.

Cheers,
john

Next message: Vlastimil Babka: "Re: [PATCH v2 2/2] mm/page_alloc: integrate classzone_idx and high_zoneidx"
Previous message: Vlastimil Babka: "Re: [PATCH v2 1/2] mm/page_alloc: use ac->high_zoneidx for classzone_idx"
In reply to: John Garry: "Re: [PATCH v3 2/2] irqchip/gic-v3-its: Balance initial LPI affinity across CPUs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]