Re: newidle balancing in NUMA domain?

From: Mike Galbraith
Date: Mon Nov 23 2009 - 09:37:49 EST


On Mon, 2009-11-23 at 12:22 +0100, Nick Piggin wrote:
> Hi,
>
> I wonder why it was decided to do newidle balancing in the NUMA
> domain? And with newidle_idx == 0 at that.
>
> This means that every time the CPU goes idle, every CPU in the
> system gets a remote cacheline or two hit. Not very nice O(n^2)
> behaviour on the interconnect. Not to mention trashing our
> NUMA locality.

Painful on little boxen too if left unchained.

> And then I see some proposal to do ratelimiting of newidle
> balancing :( Seems like hack upon hack making behaviour much more
> complex.

That's mine, and yeah, it is hackish. It just keeps newidle at bay for
high speed switchers while keeping it available to kick start CPUs for
fork/exec loads. Suggestions welcome. I have a threaded testcase
(x264) where turning the think off costs ~40% throughput. Take that
same testcase (or ilk) to a big NUMA beast, and performance will very
likely suck just as bad as it does on my little Q6600 box.

Other than that, I'd be most happy to see the thing crawl back in it's
cave and _die_ despite the little gain it provides for a kbuild. It has
been (is) very annoying.

> One "symptom" of bad mutex contention can be that increasing the
> balancing rate can help a bit to reduce idle time (because it
> can get the woken thread which is holding a semaphore to run ASAP
> after we run out of runnable tasks in the system due to them
> hitting contention on that semaphore).

Yes, when mysql+oltp starts jamming up, load balancing helps bust up the
logjam somewhat, but that's not at all why newidle was activated..

> I really hope this change wasn't done in order to help -rt or
> something sad like sysbench on MySQL.

Newidle was activated to improve fork/exec CPU utilization. A nasty
side effect is that it tries to rip other loads to tatters.

> And btw, I'll stay out of mentioning anything about CFS development,
> but it really sucks to be continually making significant changes to
> domains balancing *and* per-runqueue scheduling at the same time :(
> It makes it even difficult to bisect things.

Yeah, balancing got jumbled up with desktop tweakage. Much fallout this
round, and some things still to be fixed back up.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/