Re: [PATCH v10 00/15] Linux RISC-V AIA Support

From: Björn Töpel
Date: Fri Oct 20 2023 - 04:47:41 EST

Next message: Raag Jadav: "[PATCH v1 0/8] Refine _UID references across kernel"
Previous message: Ingo Molnar: "Re: [PATCH v2] x86/mm: Drop 4MB restriction on minimal NUMA node memory size"
In reply to: Anup Patel: "Re: [PATCH v10 00/15] Linux RISC-V AIA Support"
Next in thread: Anup Patel: "Re: [PATCH v10 00/15] Linux RISC-V AIA Support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Thanks for the quick reply!

Anup Patel <apatel@xxxxxxxxxxxxxxxx> writes:

> On Thu, Oct 19, 2023 at 7:13 PM Björn Töpel <bjorn@xxxxxxxxxx> wrote:
>>
>> Hi Anup,
>>
>> Anup Patel <apatel@xxxxxxxxxxxxxxxx> writes:
>>
>> > The RISC-V AIA specification is ratified as-per the RISC-V international
>> > process. The latest ratified AIA specifcation can be found at:
>> > https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
>> >
>> > At a high-level, the AIA specification adds three things:
>> > 1) AIA CSRs
>> > - Improved local interrupt support
>> > 2) Incoming Message Signaled Interrupt Controller (IMSIC)
>> > - Per-HART MSI controller
>> > - Support MSI virtualization
>> > - Support IPI along with virtualization
>> > 3) Advanced Platform-Level Interrupt Controller (APLIC)
>> > - Wired interrupt controller
>> > - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
>> > - In Direct-mode, injects external interrupts directly into HARTs
>>
>> Thanks for working on the AIA support! I had a look at the series, and
>> have some concerns about interrupt ID abstraction.
>>
>> A bit of background, for readers not familiar with the AIA details.
>>
>> IMSIC allows for 2047 unique MSI ("msi-irq") sources per hart, and
>> each MSI is dedicated to a certain hart. The series takes the approach
>> to say that there are, e.g., 2047 interrupts ("lnx-irq") globally.
>> Each lnx-irq consists of #harts * msi-irq -- a slice -- and in the
>> slice only *one* msi-irq is acutally used.
>>
>> This scheme makes affinity changes more robust, because the interrupt
>> sources on "other" harts are pre-allocated. On the other hand it
>> requires to propagate irq masking to other harts via IPIs (this is
>> mostly done up setup/tear down). It's also wasteful, because msi-irqs
>> are hogged, and cannot be used.
>>
>> Contemporary storage/networking drivers usually uses queues per core
>> (or a sub-set of cores). The current scheme wastes a lot of msi-irqs.
>> If we instead used a scheme where "msi-irq == lnx-irq", instead of
>> "lnq-irq = {hart 0;msi-irq x , ... hart N;msi-irq x}", there would be
>> a lot MSIs for other users. 1-1 vs 1-N. E.g., if a storage device
>> would like to use 5 queues (5 cores) on a 128 core system, the current
>> scheme would consume 5 * 128 MSIs, instead of just 5.
>>
>> On the plus side:
>> * Changing interrupts affinity will never fail, because the interrupts
>> on each hart is pre-allocated.
>>
>> On the negative side:
>> * Wasteful interrupt usage, and a system can potientially "run out" of
>> interrupts. Especially for many core systems.
>> * Interrupt masking need to proagate to harts via IPIs (there's no
>> broadcast csr in IMSIC), and a more complex locking scheme IMSIC
>>
>> Summary:
>> The current series caps the number of global interrupts to maximum
>> 2047 MSIs for all cores (whole system). A better scheme, IMO, would be
>> to expose 2047 * #harts unique MSIs.
>>
>> I think this could simplify/remove(?) the locking as well.
>
> Exposing 2047 * #harts unique MSIs has multiple issues:
> 1) The irq_set_affinity() does not work for MSIs because each
> IRQ is not tied to a particular HART. This means we can't
> balance the IRQ processing load among HARTs.

Yes, you can balance. In your code, each *active* MSI is still
bound/active to a specific hard together with the affinity mask. In an
1-1 model you would still need to track the affinity mask, but the
irq_set_affinity() would be different. It would try to allocate a new
MSI from the target CPU, and then switch to having that MSI active.

That's what x86 does AFAIU, which is also constrained by the # of
available MSIs.

The downside, as I pointed out, is that the set affinity action can
fail for a certain target CPU.

> 2) All wired IRQs for APLIC MSI-mode will also target a
> fixed HART hence irq_set_affinity() won't work for wired
> IRQs as well.

I'm not following here. Why would APLIC put a constraint here? I had a
look at the specs, and I didn't see anything supporting the current
scheme explicitly.

> 3) Contemporary storage/networking drivers which use per-core
> queues use irq_set_affinity() on queue IRQs to balance
> across cores but this will fail.

Or via the the managed interrupts. But this is a non-issue, as pointed
out in my reply to 1.

> 4) HART hotplug breaks because kernel irq-subsystem can't
> migrate the IRQs (both MSIs and Wired) targeting HART X
> to another HART Y when the HART X goes down.

Yes, we might end up in scenarios where we can't move to a certain
target cpu, but I wouldn't expect that to be a common scenario.

> The idea of treating per-HART MSIs as separate IRQs has
> been discussed in the past.

Aha! I tried to look for it in lore, but didn't find any. Could you
point me to those discussions?

> Also, the current approach is very similar to the ARM GICv3 driver
> where ITS LPIs across CPUs are treated as single IRQ.

I'm not familiar with the GIC. Is the GICv3 design similar to IMSIC? I
had the impression that the GIC had a more advanced interrupt routing
mechanism, than what IMSIC exposes. I think x86 APIC takes the 1-1
approach (the folks on the To: list definitely knows! ;-)).

My concern is interrupts become a scarce resource with this
implementation, but maybe my view is incorrect. I've seen bare-metal
x86 systems (no VMs) with ~200 cores, and ~2000 interrupts, but maybe
that is considered "a lot of interrupts".

As long as we don't get into scenarios where we're running out of
interrupts, due to the software design.

Björn

Next message: Raag Jadav: "[PATCH v1 0/8] Refine _UID references across kernel"
Previous message: Ingo Molnar: "Re: [PATCH v2] x86/mm: Drop 4MB restriction on minimal NUMA node memory size"
In reply to: Anup Patel: "Re: [PATCH v10 00/15] Linux RISC-V AIA Support"
Next in thread: Anup Patel: "Re: [PATCH v10 00/15] Linux RISC-V AIA Support"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]