Re: [PATCH v10 00/15] Linux RISC-V AIA Support

From: Anup Patel
Date: Thu Oct 19 2023 - 12:11:27 EST


On Thu, Oct 19, 2023 at 7:13 PM Björn Töpel <bjorn@xxxxxxxxxx> wrote:
>
> Hi Anup,
>
> Anup Patel <apatel@xxxxxxxxxxxxxxxx> writes:
>
> > The RISC-V AIA specification is ratified as-per the RISC-V international
> > process. The latest ratified AIA specifcation can be found at:
> > https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
> >
> > At a high-level, the AIA specification adds three things:
> > 1) AIA CSRs
> > - Improved local interrupt support
> > 2) Incoming Message Signaled Interrupt Controller (IMSIC)
> > - Per-HART MSI controller
> > - Support MSI virtualization
> > - Support IPI along with virtualization
> > 3) Advanced Platform-Level Interrupt Controller (APLIC)
> > - Wired interrupt controller
> > - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
> > - In Direct-mode, injects external interrupts directly into HARTs
>
> Thanks for working on the AIA support! I had a look at the series, and
> have some concerns about interrupt ID abstraction.
>
> A bit of background, for readers not familiar with the AIA details.
>
> IMSIC allows for 2047 unique MSI ("msi-irq") sources per hart, and
> each MSI is dedicated to a certain hart. The series takes the approach
> to say that there are, e.g., 2047 interrupts ("lnx-irq") globally.
> Each lnx-irq consists of #harts * msi-irq -- a slice -- and in the
> slice only *one* msi-irq is acutally used.
>
> This scheme makes affinity changes more robust, because the interrupt
> sources on "other" harts are pre-allocated. On the other hand it
> requires to propagate irq masking to other harts via IPIs (this is
> mostly done up setup/tear down). It's also wasteful, because msi-irqs
> are hogged, and cannot be used.
>
> Contemporary storage/networking drivers usually uses queues per core
> (or a sub-set of cores). The current scheme wastes a lot of msi-irqs.
> If we instead used a scheme where "msi-irq == lnx-irq", instead of
> "lnq-irq = {hart 0;msi-irq x , ... hart N;msi-irq x}", there would be
> a lot MSIs for other users. 1-1 vs 1-N. E.g., if a storage device
> would like to use 5 queues (5 cores) on a 128 core system, the current
> scheme would consume 5 * 128 MSIs, instead of just 5.
>
> On the plus side:
> * Changing interrupts affinity will never fail, because the interrupts
> on each hart is pre-allocated.
>
> On the negative side:
> * Wasteful interrupt usage, and a system can potientially "run out" of
> interrupts. Especially for many core systems.
> * Interrupt masking need to proagate to harts via IPIs (there's no
> broadcast csr in IMSIC), and a more complex locking scheme IMSIC
>
> Summary:
> The current series caps the number of global interrupts to maximum
> 2047 MSIs for all cores (whole system). A better scheme, IMO, would be
> to expose 2047 * #harts unique MSIs.
>
> I think this could simplify/remove(?) the locking as well.

Exposing 2047 * #harts unique MSIs has multiple issues:
1) The irq_set_affinity() does not work for MSIs because each
IRQ is not tied to a particular HART. This means we can't
balance the IRQ processing load among HARTs.
2) All wired IRQs for APLIC MSI-mode will also target a
fixed HART hence irq_set_affinity() won't work for wired
IRQs as well.
3) Contemporary storage/networking drivers which use per-core
queues use irq_set_affinity() on queue IRQs to balance
across cores but this will fail.
4) HART hotplug breaks because kernel irq-subsystem can't
migrate the IRQs (both MSIs and Wired) targeting HART X
to another HART Y when the HART X goes down.

The idea of treating per-HART MSIs as separate IRQs has
been discussed in the past. The current approach works nicely
with all kernel use-cases at the cost of additional work on the
driver side.

Also, the current approach is very similar to the ARM GICv3
driver where ITS LPIs across CPUs are treated as single IRQ.

>
> I realize that the series in v10, and coming with a change like this
> now might be a bit of a pain...
>
> Finally, another question related to APLIC/IMSIC. AFAIU the memory map
> of the IMSIC regions are constrained by the APLIC, which requires a
> certain layout for MSI forwarding (group/hart/guest bits). Say that a
> system doesn't have an APLIC, couldn't the layout requirement be
> simplified?

Yes, this is already taken care of in the current IMSIC driver based
on feedback from Atish. We can certainly improve flexibility on the
IMSIC driver side if some case is missed-out.

The APLIC driver is certainly very strict about the arrangement of
IMSIC files so we do additional sanity checks on the APLIC driver
side at the time of probing.

>
>
> Again, thanks for the hard work!
> Björn

Regards,
Anup