Re: [PATCH] x86_64: Dynamically allocate arch specific system vectors

From: Mike Travis
Date: Mon Aug 04 2008 - 16:39:52 EST




Alan Mayer wrote:
>
>
> Eric W. Biederman wrote:
>> Alan Mayer <ajm@xxxxxxx> writes:
>>
>>> Okay, I think we have it now. assign_irq_vector *almost* does what
>>> we need.
>>> One minor thing is that assign_irq_vector ANDs against
>>> cpu_online_map. We would
>>> need cpu_possible_map, so we get the vector on offline cpus that may
>>> come
>>> online. The other thing is that assign_irq_vector doesn't allow the
>>> specification of interrupt priorities. It would need to be modified
>>> to handle
>>> returning either a high priority vector or a low priority vector. Would
>>> modifying the api for assign_irq_vector be the proper approach?
>>
>> I don't know if it makes sense to modify assign_irq_vector or to have
>> a companion function that uses the same data structures.
>>
>> I think I would work on the companion function and if the code
>> can be made sufficiently similar merge the two functions.
>>
> Okay, If I understand you, here's what we can do. We currently have
> this function that does pretty much what the combination of create_irq()
> and __assign_irq_vector() do. We can accomplish the same thing that our
> routine does using create_irq() and __assign_irq_vector() do if we make
> the following changes:
>
> __assign_irq_vector(int irq, cpumask_t mask) ==>
> __assign_irq_vector(int irq, cpumask_t mask, int priority);
>
> priority has three values: priority_none, priority_low, priority_high
> priority_none means do everything the way it is done now.
> priority_low means do everything the way its is done now, except use
> cpu_possible_map rather than cpu_online_map.
> priority_high means search the interrupt vectors from the top down,
> rather than from the bottom up and use cpu_possible_map rather than
> cpu_online_map.

Checking to insure that at least one of the cpus is online seems to be
prudent, as well as what happens when the last online cpu in the group
goes offline?

Thanks,
Mike
>
> create_irq(void) ==> create_irq(int priority, cpumask_t *mask)
> priority_none, means do everything the way it is done now, passing in
> TARGET_CPUS as the mask, but also sending the priority arg. into
> __assign_irq_vector().
> priority_low and priority_high means use create_irq()'s mask arg. as the
> mask passed to __assign_irq_vector).
>
> We would add an additional small routine on top of create_irq() to do
> any massaging of the irq_desc, etc. that we need for these system vectors.
>
> Is that what you were thinking about?
>
> --ajm
>
>>> The interrupts don't necessarily fire on all cpus, it's just that
>>> they *can*
>>> fire on any cpu. For example, the GRU triggers an interrupt (it is very
>>> IPI'ish) to a particular cpu in the event of a GRU TLB fault. That
>>> cpu handles
>>> the fault and returns. But the fault can happen on any cpu, so all
>>> cpus need to
>>> be registered for the same vector and irq. This is probably splitting
>>> hairs; it
>>> is certainly no different in principal from timer interrupts or
>>> processor TLB
>>> faults.
>>
>> Reasonable. As long as you don't need to read a status register to
>> figure
>> out what to do that sounds reasonable. This does sound very much like
>> splitting hairs on a very platform specific capability.
>>
>> If we can generalize the mechanism to things like per cpu timer
>> interrupts and such so that we reduced the total amount of code we
>> have to maintain I would find it a very compelling mechanism.
>>
>>> As far as kernel_stat is concerned. I see you're point. NR_CPUS on our
>>> machines is going to be big (4K? 8K? something like that). NR_IRQS
>>> is also
>>> going to big because of that. It's unfortunate since the actual
>>> number of
>>> interrupt sources is going to be an order of magnitude smaller, at
>>> least.
>>
>> The number of interrupts sources is going to be smaller only because
>> SGI machines have or at least appear to have poor I/O compared to most
>> of the rest of machines in existence. NR_CPUS*16 is a fairly
>> reasonable estimate on most machines in existence. In the short term
>> it is going to get worse in the presence of MSI-X. I was talking to a
>> developer at Intel last week about 256 irqs for one card. I keep
>> having dreams about finding a way to just keep stats for a few cpus
>> but alas I don't think that is going to happen. Silly us.
>>
>> Eric
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/