RE: [PATCH] ASoC: Intel: sst: ipc command timeout

From: Lu, Brent
Date: Thu Apr 30 2020 - 11:38:12 EST


>
> Hi,
> yes that seems bit weird. It is bit better as it does not modify common code,
> but still... Maybe going back to your original idea of replacing memcpy, try
> replacing it with readq? It should generate one instruction read (although it is
> only for x64_64, for 32 bit kernel we would still need to do something else).
>
> Thanks,
> Amadeusz

Hi,

I've compared the assembly to see if there is clue. Both kernels are using 64-bit
mov to read register and the only difference is optimized or not. Both
implementations are looking good to me. Currently I don't have answer why
slower kernel hits the problem while optimized one survived.

1. Old kernel. Code is optimized and not able to reproduce the issue on this kernel.

(gdb) disas sst_shim32_read64
Dump of assembler code for function sst_shim32_read64:
0x000000000000096c <+0>: call 0x971 <sst_shim32_read64+5>
=> call __fentry__
0x0000000000000971 <+5>: push rbp
0x0000000000000972 <+6>: mov rbp,rsp
0x0000000000000975 <+9>: mov eax,esi
0x0000000000000977 <+11>: mov rax,QWORD PTR [rdi+rax*1]
=> perform 64-bit mov
0x000000000000097b <+15>: pop rbp
0x000000000000097c <+16>: ret
End of assembler dump.

2. New kernel: obviously optimization is disabled and it calls memcpy to do the read operation.

(gdb) disas sst_shim32_read64
Dump of assembler code for function sst_shim32_read64:
0x00000000000009a8 <+0>: call 0x9ad <sst_shim32_read64+5>
=> call __fentry__
0x00000000000009ad <+5>: push rbp
0x00000000000009ae <+6>: mov rbp,rsp
0x00000000000009b1 <+9>: push rbx
0x00000000000009b2 <+10>: sub rsp,0x10
0x00000000000009b6 <+14>: mov rax,QWORD PTR gs:0x28
0x00000000000009bf <+23>: mov QWORD PTR [rbp-0x10],rax
0x00000000000009c3 <+27>: movabs rax,0xaaaaaaaaaaaaaaaa
0x00000000000009cd <+37>: lea rbx,[rbp-0x18]
0x00000000000009d1 <+41>: mov QWORD PTR [rbx],rax
0x00000000000009d4 <+44>: mov esi,esi
0x00000000000009d6 <+46>: add rsi,rdi
0x00000000000009d9 <+49>: mov edx,0x8
0x00000000000009de <+54>: mov rdi,rbx
0x00000000000009e1 <+57>: call 0x9e6 <sst_shim32_read64+62>
=> call memcpy

The memcpy is implemented in arch/x86/lib/memcpy_64.S

(gdb) disas memcpy
Dump of assembler code for function memcpy:
0xffffffff813519c0 <+0>: jmp 0xffffffff813519f0 <memcpy_orig>
=> jump to memcpy_orig function

X86_FEATURE_ERMS is disabled so it jumps to memcpy_orig

(gdb) disas memcpy_orig
Dump of assembler code for function memcpy_orig:
0xffffffff813519f0 <+0>: mov rax,rdi
0xffffffff813519f3 <+3>: cmp rdx,0x20
0xffffffff813519f7 <+7>: jb 0xffffffff81351a77 <memcpy_orig+135>
=> jump because our read size is 8
...
0xffffffff81351a77 <+135>: cmp edx,0x10
0xffffffff81351a7a <+138>: jb 0xffffffff81351aa0 <memcpy_orig+176>
=> jump because our read size is 8
...
0xffffffff81351aa0 <+176>: cmp edx,0x8
0xffffffff81351aa3 <+179>: jb 0xffffffff81351ac0 <memcpy_orig+208>
0xffffffff81351aa5 <+181>: mov r8,QWORD PTR [rsi]
0xffffffff81351aa8 <+184>: mov r9,QWORD PTR [rsi+rdx*1-0x8]
0xffffffff81351aad <+189>: mov QWORD PTR [rdi],r8
0xffffffff81351ab0 <+192>: mov QWORD PTR [rdi+rdx*1-0x8],r9
=> perform 64-bit mov twice over same address (rdx=0x8)
0xffffffff81351ab5 <+197>: ret

Regards,
Brent