> 805dc: 6a96 move.d $r10,$r9While I can understand that certain architectures may benefit from that alteration, I am curious as to what SPECIFICALLY it is doing that is different. How do they differ?
I do not know if this will give you anything, but here is
the disassembled CRIS version of your first while loop (note that the branch instructions have one delay slot,
and that dest, src and count are in $r10, $r11 and $r12):
while (count && *src)
805d8: 6cc6 test.d $r12
805da: 1830 beq 805f4
805de: 8b0b test.b [$r11]
805e0: 1230 beq 805f4
805e2: 0f05 nop
{
count--;
805e4: 81c2 subq 1,$r12
*tmp++ = *src++;
805e6: 4bde move.b [$r11+],$r13
805e8: 6cc6 test.d $r12
805ea: 0830 beq 805f4
805ec: c9df move.b $r13,[$r9+]
805ee: 8b0b test.b [$r11]
805f0: f320 bne 805e4
805f2: 0f05 nop
}
And here is my first while loop:
while (count)
8062c: 6cc6 test.d $r12
8062e: 1030 beq 80640
80630: 6a96 move.d $r10,$r9
{
count--;
if (!(*tmp++ = *src++))
80632: 4bde move.b [$r11+],$r13
80634: c9db move.b $r13,[$r9]
80636: 890f test.b [$r9+]
80638: 0630 beq 80640
8063a: 81c2 subq 1,$r12
break;
8063c: f520 bne 80632
8063e: 0f05 nop
}
Also note that your version
of the second loop needs an explicit comparison with -1,
whereas mine uses an implicit comparison with 0.
I don't understand why you say I need an explicit comparison with -1. My first loop exits either with the number of bytes remaining in the buffer or with zero if it's copied count number of bytes.
I was talking about the second loop. The object code the compiler
produces for your version actually tests the count variable after
it decreases it, which is why it tests for -1.
The second loop WOULD require a comparison with -1 IF the "count--" were not inside of the loop body. As it IS in the loop body, there is no need for that. My second loop has an implicit comparison against zero.
Hmm, your second loop from above looks like:
while (n--) {
*s++ = 0;
}
whereas mine looks like:
while (count) {
*tmp++ = '\0';
count--;
}
You seem to be referring to my version, where what you say is true.
I agree that this is definately a more elegant look to the code, and I would prefer what you have done here. But what puzzles me is that this is functionally and logically equivalent to my code.
So, this code:
for (A; B; C) {}
is the same as this:
A;
while (B) {
...
C;
}
So why is it that this mere syntactic difference causes the compiler to produce a better result?
I wish I new. Actually in the CRIS case, it seems to be an
optimizer thing. If I change your first loop from
while (n && *s2) {
n--;
*s++ = *s2++;
}
to
while (n && *s2) {
*s++ = *s2++;
n--;
}
it gives the expected object code, i.e., the same as my
first for loop. So here is a modified version of your
code that gives exactly the same object code (for CRIS) as my version with the for loops:
char *strncpy(char * s1, const char * s2, size_t n)
{
register char *s = s1;
while (n && *s2) {
*s++ = *s2++;
n--;
}
while (n) {
*s++ = 0;
n--;
}
return s1;
}
//Peter