Re: [RFC] killing the NR_IRQS arrays.

From: Benjamin Herrenschmidt
Date: Sun Feb 18 2007 - 14:58:50 EST



> Because I don't have something better to replace them with.
>
> We need names for irqs, currently the kernel/user space interface
> is a unsigned number.

.../...

Ok, as long as it's strictly a userspace issue, I understand.

> The model can be made to work if you force it but it isn't really
> a good fit.
>
> I can't really use the (cpu#, vector#) tuple as hw number as it
> varies at runtime, and a single interrupt can send different (cpu#,
> vector#) tuples from one interrupt message to the next without being
> reprogrammed. At least I don't have the impression that you support
> multiple hardware numbers going to the same linux irq. But this
> really is the layer where I need the reverse mapping. However I can
> optimize the reverse mapping by taking advantage of the per cpu
> nature.
>
> Currently the hardware number that I use is the pin number on
> the ioapic. And to form the linux irq I just add the number
> of pins of all previous ioapics together and then add my pin number.
> Fairly simple.

Ok.

> Doing the above gives me stable names that are the same from one boot
> to the next if someone doesn't change how the hardware is put
> together. It looks to me that if I adapt the ppc scheme my irq
> numbers will change from one boot to the next one kernel to the next,
> almost at random.

Not necessarily. The current code would though in practice, it doesn't
change much, but as I said, it would be trivial to change it so that a
domain using a linear reverse map is fully "allocated" when initialized.

Stable numbers are somewhat useful for users, thus one of the thing the
ppc code tries to do in order to get "mostly" stable numbers/names as
well is to try to use the HW number as a "starting" value when searching
for a linux irq number to assign. Only if it's not possible (the HW
number is in the reserved ISA range, too big, or the matching linux
number is already used) then it will allocate a new one.

I still think I need to add the domain and HW number to /proc/interrupts
though (or to some other sysfs file) in order to provide the mappign to
virqs to userland.

Also, if the eixsting linear and radix tree reverse maps don't fit your
needs, you can create a new one or request no reverse map from the core
at all and do everything in your arch code.

Basically, what I'm saying is that I'd like the concept of domain &
generic remappers to become generic. sparc64 and ppc use it and I'm
convinced it would be useful to other especially archs with lots of
cascaded PICs like ARM. And by doing so, we can also make it easier to
expose the HW -> virq mapping to userland in a common place with a
consistent format since it would be done by generic code.

Cheers,
Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/