Re: [PATCH 1 of 3] Introduce __raw_memcpy_toio32

From: Bryan O'Sullivan
Date: Tue Jan 10 2006 - 11:01:22 EST

Next message: Alexey Dobriyan: "2.6.15-$SHA1: VT <-> X sometimes odd"
Previous message: Dmitry Torokhov: "Re: PROBLEM: PS/2 keyboard does not work with 2.6.15"
In reply to: Sam Ravnborg: "Re: [PATCH 1 of 3] Introduce __raw_memcpy_toio32"
Next in thread: Bodo Eggert: "Re: [PATCH 1 of 3] Introduce __raw_memcpy_toio32"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, 2006-01-10 at 01:18 -0800, Andrew Morton wrote:
> "Bryan O'Sullivan" <bos@xxxxxxxxxxxxx> wrote:
> >
> > This arch-independent routine copies data to a memory-mapped I/O region,
> > using 32-bit accesses. It does not guarantee access ordering, nor does
> > it perform a memory barrier afterwards. This style of access is required
> > by some devices.
>
> Not providing orderingor barriers makes this a rather risky thing to export
> - people might use it, find their driver "works" on one architecture, but
> fails on another.

The kdoc comments for the routine clearly state these limitations, so I
hope that between the comments and the naming, the risk is minimal.

> I guess the "__" is a decent warning of this, and the patch anticipates a
> higher-level raw_memcpy_toio32() which implements those things, yes?

It leaves room for it, yes, though I don't see much reason to add such a
routine until a driver specifically needs it.

> How come __raw_memcpy_toio32() is inlined?

There's no technical reason for it to be. I'm simply trying to find an
acceptable way to get the code into the tree that accommodates per-arch
implementations.

So let me rewind a little and state my problem.

My driver needs a copy routine that works in 32-bit chunks, writes to
MMIO space, doesn't care about ordering, and doesn't guarantee a memory
barrier. It also very much wants individual arches to be able to
implement this routine themselves; even though it's a small, simple
loop, we've benchmarked our x86_64 version as giving us 5% better
performance overall (i.e. visible to apps, not just microbenchmarks)
when doing reasonably large copies. I'd expect other arches to have
similar benefits.

I'm more than willing to recast the code into whatever form makes people
happy, but it would be greatly beneficial to me if it also met my
requirements.

So far, my attempts have been thus:

* out of line, with __HAVE_ARCH_XXX to avoid bloat on arches that
have specialised implementations - __HAVE_ARCH_XXX is out of
style
* out of line, unconditional - made people unhappy on bloat
grounds, since arches that have specialised implementations end
up with an extra unneeded routine
* inline, apparently in Linus's preferred style - an inline that
isn't really necessary

For a routine whose C implementation is six lines long, it's had an
impressive submission history :-)

What do you suggest I try next? I'll be happy to try a different tack.

<b

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Alexey Dobriyan: "2.6.15-$SHA1: VT <-> X sometimes odd"
Previous message: Dmitry Torokhov: "Re: PROBLEM: PS/2 keyboard does not work with 2.6.15"
In reply to: Sam Ravnborg: "Re: [PATCH 1 of 3] Introduce __raw_memcpy_toio32"
Next in thread: Bodo Eggert: "Re: [PATCH 1 of 3] Introduce __raw_memcpy_toio32"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]