Re: [PATCH] x86_64: Dynamically allocate arch specific system vectors

From: Alan Mayer
Date: Fri Aug 01 2008 - 17:49:36 EST

Next message: Paul Mundt: "Re: sh O= builds broken"
Previous message: Sven Wegener: "[PATCH] leds: Move triggers before drivers in Makefile"
In reply to: Eric W. Biederman: "Re: [PATCH] x86_64: Dynamically allocate arch specific system vectors"
Next in thread: Eric W. Biederman: "Re: [PATCH] x86_64: Dynamically allocate arch specific system vectors"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Okay, I think we have it now. assign_irq_vector *almost* does what we need. One minor thing is that assign_irq_vector ANDs against cpu_online_map. We would need cpu_possible_map, so we get the vector on offline cpus that may come online. The other thing is that assign_irq_vector doesn't allow the specification of interrupt priorities. It would need to be modified to handle returning either a high priority vector or a low priority vector. Would modifying the api for assign_irq_vector be the proper approach?

The interrupts don't necessarily fire on all cpus, it's just that they *can* fire on any cpu. For example, the GRU triggers an interrupt (it is very IPI'ish) to a particular cpu in the event of a GRU TLB fault. That cpu handles the fault and returns. But the fault can happen on any cpu, so all cpus need to be registered for the same vector and irq. This is probably splitting hairs; it is certainly no different in principal from timer interrupts or processor TLB faults.

As far as kernel_stat is concerned. I see you're point. NR_CPUS on our machines is going to be big (4K? 8K? something like that). NR_IRQS is also going to big because of that. It's unfortunate since the actual number of interrupt sources is going to be an order of magnitude smaller, at least.

--ajm

Eric W. Biederman wrote:

Cliff Wickman <cpw@xxxxxxx> writes:
I assume you mean create_irq() and destroy_irq().
They are close to what we need.

Actually that isn't quite what I was thinking.

However for any given system use:
- we need to request a high priority vector for some irq's, rather than
one randomly allocated as per __assign_irq_vector().

Well I was actually thinking mostly about reusing assign_irq_vector.

The high priority vector seems to be the place where we would need
to reexamine the vector allocator.

- we want the irq/vector to be targeted to all cpu's (as specified in a
mask,
and can include currently offline cpu's) rather than a single cpu.

__assign_irq_vector should be able to handle that case.

I suppose those abilities could be added to create_irq(), but we didn't
want to intrude into that interface.

Right. That part gets custom and there is no intersection.

A smaller consideration is simplicity of use. We want any such user to
use
the generic do_IRQ() flow (not alloc_intr_gate()).

My primary concern is that you are generalizing an interface that is
designed for alloc_intr_gate consumers. In particular for people
who wish to allocate vectors that can be triggered by software to
do this.

If this is normal irq handling I would really prefer to use the
normal data structures that we use for allocating vectors to irqs,
because that is what you are doing.

But make it easy to set
up the irq/vector, irq_chip and irq_desc without getting intimate with
the details.
I suppose some other wrapper for an enhanced create_irq() could be done.

We are going to need such irq/vector pairs for a couple of UV drivers
(drivers/misc/sgi-gru/ and sgi-xp/). And would prefer it for the UV TLB
shootdown (x86/kernel/tlb.uv.c) rather than using alloc_intr_gate().

I also want to ask why do you need an irq that fires on every cpu?

That concept seems to be responsible for one of the nastiest data
structures in the kernel. Where we track how many times any irq has
occurred on any cpu. Look at the percpu variable kernel_stat. It
seems to be one of the nastiest variables I know of. Of size:
NR_CPUS*NR_CPUS*32 Assuming you have a machine with a reasonable
number of irqs per cpu.

Eric

--
Make me an angel
That flies from Montgomery.
--
Alan J. Mayer
SGI
ajm@xxxxxxx
WORK: 651-683-3131
HOME: 651-407-0134
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Paul Mundt: "Re: sh O= builds broken"
Previous message: Sven Wegener: "[PATCH] leds: Move triggers before drivers in Makefile"
In reply to: Eric W. Biederman: "Re: [PATCH] x86_64: Dynamically allocate arch specific system vectors"
Next in thread: Eric W. Biederman: "Re: [PATCH] x86_64: Dynamically allocate arch specific system vectors"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]