On Fri, 18 Apr 2003, Jeff Garzik wrote:
> Richard B. Johnson wrote:
> > On Fri, 18 Apr 2003, Jeff Garzik wrote:
> >>Linus Torvalds wrote:
> >>>On Fri, 18 Apr 2003, Jeff Garzik wrote:
> >>>>You should save the strlen result to a temp var, and then s/strcpy/memcpy/
> >>>No, you should just not do this. I don't see the point.
> >>strcpy has a test for each byte of its contents, and memcpy doesn't.
> >>Why search 's' for NULL twice?
> >> Jeff
> > Because it doesn't. strcpy() is usually implemented by getting
> > the string-length, using the same code sequence as strlen(), then
> > using the same code sequence as memcpy(), but copying the null-byte
> > as well. The check for the null-byte is done in the length routine.
> > If you do a memcpy(a, b, strlen(b));, then you are making two
> > procedure calls and dirtying the cache twice..
> Wrong, because we have to call strlen _anyway_, to provide the size to
> > A typical Intel procedure, stripped of the push/pops to save
> > registers is here....
> That's kinda cute. Why not submit a patch to the strcpy implementation
> in include/asm-i386/string.h? :) Ours is shorter, but does have a jump:
> "testb %%al,%%al\n\t"
> "jne 1b"
Years ago I did submit patches of all kinds, mostly 'asm' stuff
because I have been doing assembly for over 20 years. However,
I got tired of being shot-down in flames by persons who haven't
a clue, so I stopped doing that.
The history of the stuff in asm-i386 is full of changes, with
many bugs introduced by persons who tried so save a nanosecond
here and there. In recent times (past two years) somebody changed
the stuff back to things which were not optimum, but quite obviously
I might 'risk' sending in some patches again. Maybe.
> Which is better? I don't know; I'm still learning the performance
> eccentricities of x86 insns on various processors.
The test for every byte transferred is, quite obviously, correct.
It is also, quite obviously, non optimum.
> Related x86 question: if the memory buffer is not dword-aligned, is
> 'rep movsl' the best idea? On RISC it's usually smarter to unroll the
> head of the loop to avoid unaligned accesses; but from reading x86 asm
> code in the kernel, nobody seems to care about that. Is the
> unaligned-access penalty so small that the increased code size of the
> head-unroll is never worth it?
Unaligned access takes a penalty. On early i586 machines, it was
horrible, doubled the access time. On i486 and, later on i686
machines, the access times are not changed as radically. Many
of the changes, that were later reverted back, to the string
and memory 'asm' routines occurred when the 'awful' i586
came out. Of course, the unaligned access on the i586 was
still faster than the i486 with aligned access (because of
clock speeds). However, it was worth the trouble to improve
the assembly routines at that time.
> > A lot of persons who are unfamiliar with tools other than 'C' think
> > that strcpy() is made like this:
> > while(*dsp++ = *src++)
> > ;
> In fact, that's basically the kernel's non-arch-specific implementation :)
Yep. Naive code looks so 'simple', must be "optimum", no? ;^).
Penguin : Linux version 2.4.20 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to firstname.lastname@example.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Wed Apr 23 2003 - 22:00:25 EST