Re: [PATCH] sched: Revert 14a40ffccd61 ("sched: replacePF_THREAD_BOUND with PF_NO_SETAFFINITY")

From: Tejun Heo
Date: Mon Dec 02 2013 - 15:15:46 EST


Hello, Ingo.

On Mon, Dec 02, 2013 at 06:28:37PM +0100, Ingo Molnar wrote:
> No, NUMA is a performance issue in this specific case, not a
> correctness issue. 'Correctness' for configuration details mostly
> means 'stuff does not crash'.

That's exactly the thing. Workqueue users can set any random affinity
be that per-cpu, per-NUMA or whatever (I'm not talking about the
automatic NUMA affinity here), and they should be able to depend on
the configured affinity being honored as long as that request has
succeeded. Changing affinity underneath workqueue code actively
breaks the API and has possibility to lead to crashes.

> We try to give good, sensible, well performing defaults out of box,
> but otherwise we try to let root mis-configure (or improve!) their
> systems rather widely.

Yeah, sure, that's why there's custom attributes interface which can
be exposed via sysfs.

> That's only because 'private affinity' attributes detached scheduler
> affinity handling snuck into the workqueue code. It's bad design and
> it needs to be fixed - and the worst aspect is undone via this revert.

You can't do that because of the very nature of workqueue - the tasks
are completely ephemeral and automatically pooled. Nothing is
per-task. There are attributes which aren't even defined for
individual tasks. For workqueue attributes to be customizable, it
needs some form of custom interface.

> We already have APIs to manage affinities, in the scheduler.
> Privatizing this function into workqueue.c is limiting and actively
> wrong.

Those are pool attributes which the tasks are keyed by. workqueue
needs to know them to put the matching ones in the same pool.

> Such APIs should be done properly, by extending existing scheduler
> APIs if needed, not by turning off the old APIs via a crude flag and
> creating some private, per subsystem implementation ...

I don't really follow this logic. workqueue is a super structure
built on top of tasks and has certain requirements on how its member
tasks behave and will break, in ways which would be difficult to track
down, if those requirements are breached. This is something
structurally fundamental.

I suppose workqueue can taint / generate warnings if workers which are
breaking out of the requested parameters are detected, if people are
hell-bent on doing it, but the pros aren't clear at all to me.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/