Re: [patch] sched: fix improper load balance across sched domain

From: Ken Chen
Date: Wed Oct 17 2007 - 13:08:20 EST


On 10/16/07, Siddha, Suresh B <suresh.b.siddha@xxxxxxxxx> wrote:
> On Tue, Oct 16, 2007 at 12:07:06PM -0700, Ken Chen wrote:
> > We recently discovered a nasty performance bug in the kernel CPU load
> > balancer where we were hit by 50% performance regression.
> >
> > When tasks are assigned to a subset of CPUs that span across
> > sched_domains (either ccNUMA node or the new multi-core domain) via
> > cpu affinity, kernel fails to perform proper load balance at
> > these domains, due to several logic in find_busiest_group() miss
> > identified busiest sched group within a given domain. This leads to
> > inadequate load balance and causes 50% performance hit.
> >
> > To give you a concrete example, on a dual-core, 2 socket numa system,
> > there are 4 logical cpu, organized as:
>
> oops, this issue can easily happen when cores are not sharing caches. I
> think this is what happening on your setup, right?

yes, we observed the bad behavior on quad-core system with separate L2
cache as well.

- Ken
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/