[PATCH 0/5] Partitioning per-cpu interrupts

From: Marc Zyngier
Date: Mon Apr 11 2016 - 04:58:10 EST


We've unfortunately started seeing a situation where percpu interrupts
are partitioned in the system: one arbitrary set of CPUs has an
interrupt connected to a type of device, while another disjoint set of
CPUs has the same interrupt connected to another type of device.

This makes it impossible to have a device driver requesting this
interrupt using the current percpu-interrupt abstraction, as the same
interrupt number is now potentially claimed by at least two drivers,
and we forbid interrupt sharing on per-cpu interrupt.

A potential solution to this has been proposed by Will Deacon,
expanding the handling in the core code:

http://lists.infradead.org/pipermail/linux-arm-kernel/2015-November/388800.html

followed by a counter-proposal from Thomas Gleixner, which Will tried
to implement, but ran into issues where the probing code was running
in preemptible context, making the percpu-ness of interrupts difficult
to guarantee.

Another approach to this is to turn things upside down. Let's assume
that our system describes all the possible partitions for a given
interrupt, and give each of them a unique identifier. It is then
possible to create a namespace where the affinity identifier itself is
a form of interrupt number. At this point, it becomes easy to
implement a set of partitions as a cascaded irqchip, each affinity
identifier being the secondary HW irq, as outlined in the following
example:

Aff-0: { cpu0 cpu3 }
Aff-1: { cpu1 cpu2 }
Aff-2: { cpu4 cpu5 cpu6 cpu7 }

Let's assume that HW interrupt 1 is partitioned over these 3
affinities. When HW interrupt 1 fires on a given CPU, all it takes is
to find out which affinity this CPU belongs to, which gives us a new
HW interrupt number. Bingo. Of course, this only works as long as you
don't have overlapping affinities (but if you do your system is broken
anyway).

This allows us to keep a number of nice properties:

- Each partition results in a separate percpu-interrupt (with a
restricted affinity), which keeps drivers happy. This alone
garantees that we do not have to change the programming model for
per-cpu interrupts.

- Because the underlying interrupt is still per-cpu, the overhead of
the indirection can be kept pretty minimal.

- The core code can ignore most of that crap.

For that purpose, we implement a small library that deals with some of
the boilerplate code, relying on platform-specific drivers to provide
a description of the affinity sets and a set of callbacks. This also
relies on a small change in the irqdomain layer, and now offers a way
for the affinity of a percpu interrupt to be retrieved by a driver.

As an example, the GICv3 driver has been adapted to use this new
feature. Patches on top of v4.6-r3, tested on an arm64 FVP model.

Marc Zyngier (5):
irqdomain: Allow domain matching on irq_fwspec
genirq: Allow the affinity of a percpu interrupt to be set/retrieved
irqchip: Add per-cpu interrupt partitioning library
irqchip/gic-v3: Add support for partitioned PPIs
DT: arm,gic-v3: Documment PPI partition support

.../bindings/interrupt-controller/arm,gic-v3.txt | 34 ++-
drivers/irqchip/Kconfig | 4 +
drivers/irqchip/Makefile | 1 +
drivers/irqchip/irq-gic-v3.c | 176 +++++++++++++-
drivers/irqchip/irq-partition-percpu.c | 256 +++++++++++++++++++++
include/linux/irq.h | 4 +
include/linux/irqchip/irq-partition-percpu.h | 59 +++++
include/linux/irqdesc.h | 1 +
include/linux/irqdomain.h | 15 +-
kernel/irq/irqdesc.c | 26 ++-
kernel/irq/irqdomain.c | 19 +-
11 files changed, 580 insertions(+), 15 deletions(-)
create mode 100644 drivers/irqchip/irq-partition-percpu.c
create mode 100644 include/linux/irqchip/irq-partition-percpu.h

--
2.1.4