Re: Linux-2.1.4

Linus Torvalds (torvalds@cs.helsinki.fi)
Thu, 17 Oct 1996 07:51:13 +0300 (EET DST)


On Wed, 16 Oct 1996, Theodore Y. Ts'o wrote:
>
> Trust me, the new code still _is_ snappier than just about everything. In
> fact, I can more-or-less guarantee that it cannot be done faster than we do
> it now (modulus some localized optimizations). But the new code is also
> _different_ than the old code, and doing loops a byte at a time
> degraded from just "stupid" (in 2.0.x) to outright "insane" (in 2.1.x).
>
> Well, the tty layer code does this all over, mostly to avoid the number
> of copies that it needed to make --- remember, for the tty code a "byte"
> is the natural size that you generally want to work with. (For an i386
> running at 40MHz, that is the fastest way to do things; I can understand
> that on an Alpha w/o byte instructions, it might be much more painful.)
>
> So will it be faster now to copy everything from userspace into a kernel
> "bounce buffer" first, instead of fetching things from user space one
> byte at a time? That seems counter-intuitive, since traditionally the
> way you get speedups is to *reduce* the number of memory copies while
> going through a network or tty layer....

Don't worry. Byte sized accesses aren't really _that_ expensive. In places
where byte-sized copies are natural (and I'd agree that the tty layer
certainly counts as one) just continue to use them. After all, the tty layer
tends to do some operations on those bytes, so it's actually _logical_ to get
them as bytes rather than in larger blocks.

The places I reacted against were not places like the tty layer, but places
that really don't do "byte" operations in the first place. The ELF loader
really does a "memset()", which is definitely not a byte-at-a-time operation
except for the most stupid implementation. Similarly, most fast "strncpy()"
implementations tend to do word copies and do various tricks to find the zero
in the word. And I'm not talking about the kernel implementation here: I'm
talking about optimized _libc_ implementations.

Note that the overhead of "get_user()" is something like 10 assembler
instructions. It's NOT a costly operation in itself, but the instructions do
add up if you keep on doing them ;)

In short: if you actually do some _operation_ on the byte or word you're
fetching, the 10 instructions to fetch it are generally not the problem, and
doing double-buffering would only complicate the code more (certainly more
than 10 instructions). It's only if you're doing things like area copies or
clears that byte-wise operations are really silly, because there is obviously
a much better way to do them.

(And I wouldn't worry about an alpha keeping up with the tty layer. Trust
me, most alpha's have no problem at all with keeping up ;)

Linus