Re: [RFC] Correct behaviour of irq affinity?

From: Eric W. Biederman
Date: Tue Mar 24 2009 - 08:40:38 EST


Rusty Russell <rusty@xxxxxxxxxxxxxxx> writes:

> The effect of setting desc->affinity (ie. from userspace via sysfs) has varied
> over time. In 2.6.27, the 32-bit code anded the value with cpu_online_map,
> and both 32 and 64-bit did that anding whenever a cpu was unplugged.
>
> 2.6.29 consolidated this into one routine (and fixed hotplug) but introduced
> another variation: anding the affinity with cfg->domain. Is this right, or
> should we just set it to what the user said? Or as now, indicate that we're
> restricting it.
>
> If we should change it, here's what the patch looks like against x86 tip
> (cpu_mask_to_apicid_and already takes cpu_online_mask into account):

desc->affinity should be what the user requested, if it is at all
possible to honor the user space request. YH the fact that we do not
currently exercise the full freedom that user space gives us is
irrelevant.

Further setting desc->affinity to the user space request is what
x86_64 did before the grand merger.

Likewise desc->affinity & cfg->domain & cpu_online_map going into the
selection of apic id, is what the code did before the grand merger,
and what the code is currently doing. So logically that looks good.

YH has a point that several of the implementations of
cpu_mask_to_apic_id do not take cpu_online_map into account and should
probably be fixed. flat_cpu_mask_to_apicid was the one I could find.

Also now that I look at it there is one other bug in this routine
that you have missed. set_extra_move_desc should be called before
we set desc->affinity, as it compares that with the new value to
see if we are going to be running on a new cpu, and if so we may
need to reallocate irq_desc onto a new numa node. set_extra_move_desc
looks a little fishy but it doesn't stand a chance if it is called
with the wrong data.

Overall I like it.

Do you think you could fix those two issues and regenerate the patch?

> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> index 86827d8..30906cd 100644
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -592,10 +592,10 @@ set_desc_affinity(struct irq_desc *desc, const struct cpumask *mask)
> if (assign_irq_vector(irq, cfg, mask))
> return BAD_APICID;
>
> - cpumask_and(desc->affinity, cfg->domain, mask);
> + cpumask_copy(desc->affinity, mask);
> set_extra_move_desc(desc, mask);
>
> - return apic->cpu_mask_to_apicid_and(desc->affinity, cpu_online_mask);
> + return apic->cpu_mask_to_apicid_and(desc->affinity, cfg->domain);
> }
>
> static void

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/