Re: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()

From: Linus Torvalds
Date: Thu Aug 17 2023 - 11:32:29 EST


On Thu, 17 Aug 2023 at 17:16, David Laight <David.Laight@xxxxxxxxxx> wrote:
>
> gcc for x86-64 make a pigs-breakfast when the bitfields are 'char'
> and loads the constant from memory using pc-relative access.

I think your godbolt tests must be with some other model than what the
kernel uses.

For example, for me, iov_iter_init generates

testl %esi, %esi # direction test
movb $1, (%rdi) # bitfields
movq $0, 8(%rdi)
movq %rdx, 16(%rdi)
movq %r8, 24(%rdi)
movq %rcx, 32(%rdi)
setne 1(%rdi) # set the direction byte

with my patch for me. Which is pretty much optimal.

*Without& the patch, I get

movzwl .LC1(%rip), %eax
testl %esi, %esi
movb $0, (%rdi)
movb $1, 4(%rdi)
movw %ax, 1(%rdi)
movq $0, 8(%rdi)
movq %rdx, 16(%rdi)
movq %r8, 24(%rdi)
movq %rcx, 32(%rdi)
setne 3(%rdi)

which is that disgusting "move two bytes from memory", and makes
absolutely no sense as a way to "write 2 zero bytes":

.LC1:
.byte 0
.byte 0

I think that's some odd gcc bug, actually.


> Otherwise pretty must all variants (with or without the bitfield)
> get initialised in a single write.

So there may well be some odd gcc code generation issue that is
triggered by the fact that we use an initializer to set those things,
and we then have two bytes (with my patch) followed by a hole, or
three bytes (without it) followed by a hole.

But that bitfield thing improves things at least for me. I think the
example code you used for godbolt is actually something else than what
the kernel does.

Linus