Re: [Bug #11308] tbench regression on each kernel release from 2.6.22-> 2.6.28

From: Eric Dumazet
Date: Mon Nov 17 2008 - 12:34:18 EST

Next message: Robert Richter: "Re: [PATCH] oprofile: re-arm APIC_DM_NMI in ppro_check_ctrs()"
Previous message: KOSAKI Motohiro: "Re: [PATCH] vmscan: fix get_scan_ratio comment"
In reply to: Ingo Molnar: "Re: [Bug #11308] tbench regression on each kernel release from2.6.22 -> 2.6.28"
Next in thread: Linus Torvalds: "Re: [Bug #11308] tbench regression on each kernel release from 2.6.22-> 2.6.28"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Ingo Molnar a écrit :

* Ingo Molnar <mingo@xxxxxxx> wrote:

4% on my machine, but apparently my machine is sooooo special (see oprofile thread), so maybe its cpus have a hard time playing with a contended cache line.

It definitly needs more testing on other machines.

Maybe you'll discover patch is bad on your machines, this is why it's in net-next-2.6
ok, i'll try it on my testbox too, to check whether it has any effect - find below the port to -git.

it gives a small speedup of ~1% on my box:

before: Throughput 3437.65 MB/sec 64 procs
after: Throughput 3473.99 MB/sec 64 procs

Strange, I get 2350 MB/sec on my 8 cpus box. "tbench 8"

... although that's still a bit close to the natural tbench noise range so it's not conclusive and not like a smoking gun IMO.

But i think this change might just be papering over the real scalability problem that this workload has in my opinion: that there's a single localhost route/dst/device that millions of packets are squeezed through every second:

Yes, this point was mentioned on netdev a while back.

phoenix:~> ifconfig lo
lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:258001524 errors:0 dropped:0 overruns:0 frame:0
TX packets:258001524 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0 RX bytes:679809512144 (633.1 GiB) TX bytes:679809512144 (633.1 GiB)

There does not seem to be any per CPU ness in localhost networking - it has a globally single-threaded rx/tx queue AFAICS even if both the client and server task is on the same CPU - how is that supposed to perform well? (but i might be missing something)

Stephen had a patch for this one too, but we got tbench noise too with this patch

http://kerneltrap.org/mailarchive/linux-netdev/2008/11/5/3926034

What kind of test-system do you have - one with P4 style Xeon CPUs perhaps where dirty-cacheline cachemisses to DRAM were particularly expensive?

Its a HP BL460c g1

Dual quad-core cpus Intel E5450 @3.00GHz

So 8 logical cpus. My bench was "tbench 8"

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Robert Richter: "Re: [PATCH] oprofile: re-arm APIC_DM_NMI in ppro_check_ctrs()"
Previous message: KOSAKI Motohiro: "Re: [PATCH] vmscan: fix get_scan_ratio comment"
In reply to: Ingo Molnar: "Re: [Bug #11308] tbench regression on each kernel release from2.6.22 -> 2.6.28"
Next in thread: Linus Torvalds: "Re: [Bug #11308] tbench regression on each kernel release from 2.6.22-> 2.6.28"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Re: [Bug #11308] tbench regression on each kernel release from 2.6.22-&gt; 2.6.28

Re: [Bug #11308] tbench regression on each kernel release from 2.6.22-> 2.6.28