Re: [PATCH] Minor scheduler fix to get rid of skipping in xmms

From: Andrew Theurer
Date: Wed Sep 10 2003 - 21:53:59 EST


Robert Love <rml@xxxxxxxxx> wrote:
>
>> There are a _lot_ of scheduler changes in 2.6-mm, and who knows which
>> ones are an improvement, a detriment, and a noop?

> We know that sched-2.6.0-test2-mm2-A3.patch caused the regression, and
> we now that sched-CAN_MIGRATE_TASK-fix.patch mostly fixed it up.

> What we don't know is whether the thing which
> sched-CAN_MIGRATE_TASK-fix.patch
> fixed was the thing which sched-2.6.0-test2-mm2-A3.patch broke.

Sorry for jumping into this late. I didn't even know the can_migrate patch
was being discussed, let alone in -mm :). And to be fair, this really is
Ingo's aggressive idle steal patch.

Anyway, these patches are somewhat related. It would seem that A3's
shortening the tasks' run time would not only slow performance beacuse of
cache thrash, but could possibly break CAN_MIGRATE's cache warmth check,
right? That in turn would stop load balancing from working well, leading to
more idle time, which the CAN_MIGRATE patch sort of bypassed for idle cpus.

I see Nick's balance patch as somewhat harmless, at least combined with A3
patch. However, one concern is that the "ping-pong" steal interval is not
really 200ms, but 200ms/(nr_cpus-1), which without A3, could show up as a
problem, especially on an 8 way box. In addition, I do think there's a
problem with num tasks we steal. It should not be imbalance/2, it should be:
max_load - (node_nr_running / num_cpus_node). If we steal any more than
this, which is quite possible with imbalance/2, then it's likely this_cpu now
has too many tasks, and some other cpu will steal again. Using *imbalance/2
works fine on 2-way smp, but I'm pretty sure we "over steal" tasks on 4 way
and up. Anyway, I'm getting off topic here...

But Steve's latest results have me toally stumped. Why would a patch which
shortens run time and probbaly thrashes cache improve a cpu bound workload
like JBB? And why would a patch that makes sure idle cpus don't stay idle
reduce performance by so much?

Steve, are you absolutely sure your latest results on test5 are correct? Any
possibility the original results were the "good" ones?

FWIW, I have seen the CAN_MIGRATE patch make a huge difference, not just in
testing, but a -real- enterprise application used in "production". And
unlike JBB and Volano, there's no high rate of sched_yield either. They do
have a high rate of cswitches, but only because their workload message
driven. This patch made a 40% improvement on 4-way on a 2.4 distro kernel
that has O(1).

-Andrew Theurer





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/