Re: Industry db benchmark result on recent 2.6 kernels

From: Ingo Molnar
Date: Thu Mar 31 2005 - 23:53:22 EST



* Chen, Kenneth W <kenneth.w.chen@xxxxxxxxx> wrote:

> The low point in 2.6.11 could very well be the change in the
> scheduler. It does too many load balancing in the wake up path and
> possibly made a lot of unwise decision. For example, in
> try_to_wake_up(), it will try SD_WAKE_AFFINE for task that is not hot.
> By not hot, it looks at when it was last ran and compare to a constant
> sd->cache_hot_time. The problem is this cache_hot_time is fixed for
> the entire universe, whether it is a little celeron processor with
> 128KB of cache or a sever class Itanium2 processor with 9MB L3 cache.
> This one size fit all isn't really working at all.

the current scheduler queue in -mm has some experimental bits as well
which will reduce the amount of balancing. But we cannot just merge them
an bloc right now, there's been too much back and forth in recent
kernels. The safe-to-merge-for-2.6.12 bits are already in -BK.

> We had experimented that parameter earlier and found it was one of the
> major source of low point in 2.6.8. I debated the issue on LKML about
> 4 month ago and finally everyone agreed to make that parameter a boot
> time param. The change made into bk tree for 2.6.9 release, but
> somehow it got ripped right out 2 days after it went in. I suspect
> 2.6.11 is a replay of 2.6.8 for the regression in the scheduler. We
> are running experiment to confirm this theory.

the current defaults for cache_hot_time are 10 msec for NUMA domains,
and 2.5 msec for SMP domains. Clearly too low for CPUs with 9MB cache.
Are you increasing cache_hot_time in your experiment? If that solves
most of the problem that would be an easy thing to fix for 2.6.12.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/