Re: [PATCH] irq: Add node_affinity CPU masks for smarter irqbalancehints

From: Waskiewicz Jr, Peter P
Date: Mon Nov 23 2009 - 18:32:42 EST


On Mon, 23 Nov 2009, Peter Zijlstra wrote:

> On Mon, 2009-11-23 at 01:36 -0800, Peter P Waskiewicz Jr wrote:
>
> > This mechanism isn't going to be used by any internal kernel mechanism
> > for determining interrupt placement or operation. It's purely something
> > that either a driver can modify, or external script (through /proc),
> > that irqbalance will make use of. If irqbalance isn't running, or the
> > current version of irqbalance doesn't support reading node_affinity,
> > then it won't affect the system's operation.
> >
> > If irqbalance does support it, it'll read whatever the supplied mask is,
> > and then will try and balance interrupts within that mask. It will bail
> > if the mask is invalid, or won't apply to the running system, just like
> > how putting a bogus mask into smp_affinity is ignored.
> >
> > If there's something I'm missing beyond this with the two suggestions
> > you've made (I looked into those two parameters and tried to draw
> > conclusions), please let me know.
>
> I don't see the point in adding it, if the driver wants to set a node
> cpu mask it can already do that using the regular smp affinity settings.

Unfortunately, a driver can't. The irq_set_affinity() function isn't
exported. I proposed a patch on netdev to export it, and then to tie down
an interrupt using IRQF_NOBALANCING, so irqbalance won't touch it. That
was rejected, since the driver is enforcing policy of the interrupt
balancing, not irqbalance.

I and Jesse Brandeburg had a meeting with Arjan about this. What we came
up with was this interface, so drivers can set what they'd like to see, if
irqbalance decides to honor it. That way interrupt affinity policies are
set only by irqbalance, but this interface gives us a mechanism to hint to
irqbalance what we'd like it to do.

Also, if you use the /proc interface to change smp_affinity on an
interrupt without any of these changes, irqbalance will override it on its
next poll interval. This also is not desirable.

Cheers,
-PJ
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/