Re: [BENCHMARK] 2.5.47{-mm1} with contest

From: Andrew Morton (akpm@digeo.com)
Date: Thu Nov 21 2002 - 12:21:47 EST

Next message: James Simmons: "Re: [Q] is framebuffer console code in 2.5.4x functional ?"
Previous message: Jeff Garzik: "Re: spinlocks, the GPL, and binary-only modules"
In reply to: Dave Jones: "Re: [BENCHMARK] 2.5.47{-mm1} with contest"
Next in thread: William Lee Irwin III: "Re: [BENCHMARK] 2.5.47{-mm1} with contest"
Reply: William Lee Irwin III: "Re: [BENCHMARK] 2.5.47{-mm1} with contest"
Reply: Dave Jones: "Re: [BENCHMARK] 2.5.47{-mm1} with contest"
Reply: Bill Davidsen: "Re: [BENCHMARK] 2.5.47{-mm1} with contest"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Dave Jones wrote:
>
> On Wed, Nov 20, 2002 at 10:54:40PM -0800, Andrew Morton wrote:
> > > I think this merits some investigation. I, for one, am a big user of
> > > SIGIO in userspace C programs...
> > OK, got it back to 119000. Each signal was calling copy_*_user 24 times.
> > This gets it down to six.
>
> Good eyes. But.. this also applies to 2.4 (which should also then
> get faster). So the gap between 2.4 & 2.5 must be somewhere else ?

But 2.4 already inlines the usercopy functions. With this benchmark,
the cost of the function call is visible. Same with the dir_rtn_1
test - it is performing zillions of 3, 7, 10-byte copies into userspace.

The usercopy functions got themselves optimised for large copies and
cache footprint. Maybe we should inline them again. Maybe it doesn't
matter much.

> Also maybe we can do something about that multiple memcpy in copy_fpu_fxsave()
> In fact, that looks a bit fishy. We copy 10 bytes each memcpy, but
> advance the to ptr 5 bytes each iteration. What gives here ?
>

We'd buy a bit by arranging for the in-kernel copy of the fp state
to have the same layout as the hardware. That way it can be done in
a single big, fast, well-aligned slurp. But for some reason that code has
to convert into and out of a different representation.

But the real low-hanging fruit here is the observation that the
test application doesn't use floating point!!!

Maybe we need to take an fp trap now and then to "poll" the application
to see if it is still using float.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: James Simmons: "Re: [Q] is framebuffer console code in 2.5.4x functional ?"
Previous message: Jeff Garzik: "Re: spinlocks, the GPL, and binary-only modules"
In reply to: Dave Jones: "Re: [BENCHMARK] 2.5.47{-mm1} with contest"
Next in thread: William Lee Irwin III: "Re: [BENCHMARK] 2.5.47{-mm1} with contest"
Reply: William Lee Irwin III: "Re: [BENCHMARK] 2.5.47{-mm1} with contest"
Reply: Dave Jones: "Re: [BENCHMARK] 2.5.47{-mm1} with contest"
Reply: Bill Davidsen: "Re: [BENCHMARK] 2.5.47{-mm1} with contest"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Sat Nov 23 2002 - 22:00:37 EST