If byte writes are used, they should always be last for any
odd byte. I think you found a bug in spite of the fact that
whoever made the revision to memcpy probably thinks they
did something 'cool'. This is an example of cute code causing
problems. The classic example of a proper memcpy() that uses
the ix86 built-in macros runs like this:
pushl %esi # Save precious registers
pushl %edi
movl COUNT(%esp),%ecx
movl SOURCE(%esp),%esi
movl DEST(%esp),%edi
cld
shrl $1,%ecx # Make WORDS, possibly set carry
rep movsw # Copy the words
adcl %ecx,%ecx # Any spare byte
rep movsb # Copy any spare byte
popl %edi # Restore precious registers
popl %esi
Note that there isn't any code for moving dwords because the
chances of gaining anything are slim (alignment may hurt).