Strange performance behavior of 2.4.0-test9

From: kumon@flab.fujitsu.co.jp
Date: Wed Oct 25 2000 - 02:36:41 EST


Hi,
I found very odd performance behavior of 2.4.0-test9 on a large SMP
server, and I want some clues to investigate it.

The workload is WebBench,
Machine is:
        8way PII-Xeon 450MHz 2MB cache with Profusion chip-set,
        1GB Main memory with no highmem,
        using 4 eepro100 networks.

As shown in the table below, test8 is very preferable in both absolute
performance and scalability.

WebBench performance results.
                Req/sec(4cpu->8cpu)
--------------------------
2.4.0-test1 2816->3702 (31%up)
2.4.0-test8 4006->5287 (63%up)
2.4.0-test9 3669->2193 (40%down)
        (4cpu means two CPUs for each bus.)

But the performance of test9 shows following problems:

1) At the 8 cpu configuration, test9 shows extremely inferior
   performance.
2) on test8, 8-cpu configuration shows about 2/3 performance of 4-cpu.
   Other experiment shows, as # of cpu increases from 4 to 8, the
   performance gradually decreases.
        CPUs Req/s
        4cpu 3660
        5cpu 3070
        6cpu 2730
        7cpu 2611
        8cpu 2210

The reason of the performance degradation is increase of idle-time,
and increase of context switches (2.2x @4cpu, 3.1x @8cpu).

The following table shows one minute average of "vmstat 1".

configuration Req/s | r b intr c-sw user os idle
------------------------+--------------------------------------------------
2.4.0t1 4cpu 2816 | 60 0 28877 5103 18 82 0
        8cpu 3702 | 37 0 33495 16387 14 86 0
2.4.0t8 4cpu 4066 | 57 0 38743 8674 27 73 0
        8cpu 5287 | 33 0 40755 30626 21 78 0
2.4.0t9 4cpu 3669 | 53 3 35822 18898 25 73 2
        8cpu 2193 | 5 8 22114 94609 9 52 39

In the experiments, enough free memory was keeped,
no swap-in/out happened, almost no block-in/out occured.
interrups is roughly propotional to the performance.

Performance down in 2.4.0-t9 4cpu is not so prominent but actually some
processes are blocked running by some reason.
And as the number of CPUs increases the performance degrades.
On 8cpu, only 1/10 of the processes are runnable.

I've tried test10-pre5 but nothing improved.

Does anybody tell me what I should do?

--
Computer Systems Laboratory, Fujitsu Labs.
kumon@flab.fujitsu.co.jp
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Oct 31 2000 - 21:00:15 EST