Re: [patch 46/47] powerpc: Use new irq allocator

From: Benjamin Herrenschmidt
Date: Sun Oct 03 2010 - 18:58:17 EST


On Sun, 2010-10-03 at 09:53 -0700, Eric W. Biederman wrote:
> Thomas Gleixner <tglx@xxxxxxxxxxxxx> writes:
>
> >> That would make things much cleaner and in fact move one large step
> >> toward being able to make powerpc virq scheme generic, which seems to be
> >> a good idea from what I've heard :-)
> >
> > Yep.
>
> I'm not certain about making the ppc virq scheme generic. Maybe it is
> just my distorted impression but I have the understanding that ppc irq
> numbers mean nothing and are totally unstable whereas on x86 irq numbers
> in general are stable (across kernel upgrades and changes in device
> probe order) and the irq number has a useful hardware meaning. Which
> means you don't have to go through several layers of translation tables
> to figure out which hardware pin you are talking about.

In addition to Thomas comments, it's actually more complex than that :-)

Even assuming that what you say is true (and last I looked at my x86
machine, it's not ... x86 remaps "GSI" numbers and the results doesn't
seem always entirely predictible. HT interrupts makes it worse and MSIs
just completely kill your argument :-)

Some setups have stable numbers, some don't. Hypervisors can return your
crazy HW interrupt numbers, etc...

However, remapping arbitrary crazy HW number is only one aspect of the
powerpc virq scheme (typically for IRQ domains using the radix tree
based reverse-map).

The main deal I'd say is that in embedded land (and to some extent I
suspect that's going to happen more with x86), you quickly end up with
multiple interrupt domains, via cascaded controllers of all kinds etc...

In fact, I've been in situations where I want to be able to hot plug
entire PICs.

At this point, you end up having -some- kind of scheme to map the linux
IRQ numbers to HW numbers. The "old way" to do that tends to be by
assigning fixed ranges of numbers. This somewhat works, but it is a bit
clumsy and not very dynamic nor suited for hotpluggable stuff. It
generally requires the platform code to know about everything and
declare such ranges, etc...

Now, if the stability of the numbers is a problem for you, there's a few
easy things to do to solve that:

- First, and we do that today on powerpc, we reserve 1...15 as "legacy"
and only a PIC that claims to be "legacy" can claim them (for us that
means some kind of 8259). So your old style legacy x86 IRQs can remain
there if you want to.

- In systems with one domain, we tend to often end up with virq ==
hwirq since we try to allocate the same number "by default". Probably
what happens today with GSI on my x86 box here.

- Then, while powerpc allocates virq numbers when irqs are mapped, that
can be quite "late", it could be perfectly kosher to imagine a way for
"child" PICs to instead instanciate the mapping of their whole range
early. That way, their virq numbers remain contiguous, providing a
simpler 1:N mapping, and in embedded systems, you'll probably end up
with the same mapping on every boot.

- Appart from the risk of breaking crap that parses /proc/interrupts,
adding the HW irq information there would be trivial and solve your
problem.

So overall, I don't see a problem at all. And it makes handling of
arbitrary combinations of interrupt domains (cascaded PICs) very very
easy indeed.

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/