Wake-up from suspend to RAM broken under `retbleed=stuff`

From: Joan Bruguera
Date: Sat Jan 07 2023 - 22:07:59 EST


Wake-up from suspend to RAM seems broken under `retbleed=stuff`
(the recently introduced call depth tracking mitigation, see:
https://lore.kernel.org/lkml/f9fd86acac4f49bc8f90b403978e9df3@xxxxxxxxxxxxxxxx/t/)
I can replicate it on both real hardware and QEMU (with and without KVM).

It can replicate it by booting a fairly standard mainline kernel
on QEMU with `init=/bin/bash` and then suspending to RAM with:
echo "deep" > /sys/power/mem_sleep
echo "mem" > /sys/power/state
Then executing `system_wakeup` on the QEMU monitor causes the crash.

Some tracing with QEMU shows that some of the instrumentation
(`INCREMENT_CALL_DEPTH` / `sarq $5, %gs:__x86_call_depth`, ...)
seems to be done before %gs has been set up, causing a fault.

The crash happens shortly after the call to `restore_processor_state`
from `wakeup_64.S`, on the `sarq $5, %gs:__x86_call_depth` instruction.
Probably needs to be excluded?

And I can also see some other suspicious instances of `sarq $5, ...`
before the one that causes the crash, which also look suspicious.

QEMU log before the crash:

...
0xffffffff9486dc89: 4c 8b 68 10 movq 0x10(%rax), %r13
0xffffffff9486dc8d: 4c 8b 70 08 movq 8(%rax), %r14
0xffffffff9486dc91: 4c 8b 38 movq (%rax), %r15
0xffffffff9486dc94: 31 c0 xorl %eax, %eax
0xffffffff9486dc96: 48 83 c4 08 addq $8, %rsp
# (This is the 'jmp restore_processor_state' on wakeup_64.S)
0xffffffff9486dc9a: e9 51 e5 c2 00 jmp 0xffffffff9549c1f0
0xffffffff9549c1f0: 66 0f 1f 00 nopw (%rax)
0xffffffff9549c1f4: 55 pushq %rbp
0xffffffff9549c1f5: 48 89 e5 movq %rsp, %rbp
0xffffffff9549c1f8: 41 57 pushq %r15
0xffffffff9549c1fa: 41 56 pushq %r14
0xffffffff9549c1fc: 41 55 pushq %r13
0xffffffff9549c1fe: 41 54 pushq %r12
0xffffffff9549c200: 53 pushq %rbx
0xffffffff9549c201: 48 83 ec 20 subq $0x20, %rsp
0xffffffff9549c205: 80 3d 10 93 c4 01 00 cmpb $0, 0x1c49310(%rip)
0xffffffff9549c20c: 0f 85 8b 02 00 00 jne 0xffffffff9549c49d
0xffffffff9549c49d: 48 8b 15 24 90 c4 01 movq 0x1c49024(%rip), %rdx
0xffffffff9549c4a4: bf a0 01 00 00 movl $0x1a0, %edi
0xffffffff9549c4a9: 89 d6 movl %edx, %esi
0xffffffff9549c4ab: 48 c1 ea 20 shrq $0x20, %rdx
0xffffffff9549c4af: e8 32 9c 3e ff callq 0xffffffff948860e6
0xffffffff948860e6: 65 48 c1 3c 25 90 29 03 sarq $5, %gs:0x32990
0xffffffff948860ee: 00 05

RAX=0000000000000000 RBX=ffff98fbc1295d00 RCX=0000000000000000 RDX=0000000000000000
RSI=0000000000000001 RDI=00000000000001a0 RBP=ffffaace80013ca0 RSP=ffffaace80013c50
R8 =0000000000000004 R9 =0000000021bf694e R10=00000000aaaaaaab R11=0000000000000005
R12=0000000000000000 R13=0000000000000000 R14=0000000000000004 R15=ffff98fbc4ae2560
RIP=ffffffff948860e6 RFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0018 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
CS =0010 0000000000000000 ffffffff 00af9b00 DPL=0 CS64 [-RA]
SS =0018 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
DS =0018 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
FS =0018 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
GS =0018 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT= ffff98fbfec0b000 0000007f
IDT= ffffffff96604000 000001ff
CR0=80050033 CR2=000055b505d6fd48 CR3=0000000001bfa000 CR4=000006f0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=0000000000000000 CCD=0000000000000000 CCO=SARQ
EFER=0000000000000d01
check_exception old: 0xffffffff new 0xe
0: v=0e e=0000 i=0 cpl=0 IP=0010:ffffffff948860e6 pc=ffffffff948860e6 SP=0018:ffffaace80013c50 CR2=0000000000032990
RAX=0000000000000000 RBX=ffff98fbc1295d00 RCX=0000000000000000 RDX=0000000000000000
RSI=0000000000000001 RDI=00000000000001a0 RBP=ffffaace80013ca0 RSP=ffffaace80013c50
R8 =0000000000000004 R9 =0000000021bf694e R10=00000000aaaaaaab R11=0000000000000005
R12=0000000000000000 R13=0000000000000000 R14=0000000000000004 R15=ffff98fbc4ae2560
RIP=ffffffff948860e6 RFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0018 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
CS =0010 0000000000000000 ffffffff 00af9b00 DPL=0 CS64 [-RA]
SS =0018 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
DS =0018 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
FS =0018 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
GS =0018 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT= ffff98fbfec0b000 0000007f
IDT= ffffffff96604000 000001ff
CR0=80050033 CR2=0000000000032990 CR3=0000000001bfa000 CR4=000006f0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=0000000000000000 CCD=0000000000000000 CCO=SARQ
EFER=0000000000000d01
check_exception old: 0xe new 0xd
1: v=08 e=0000 i=0 cpl=0 IP=0010:ffffffff948860e6 pc=ffffffff948860e6 SP=0018:ffffaace80013c50 env->regs[R_EAX]=0000000000000000
RAX=0000000000000000 RBX=ffff98fbc1295d00 RCX=0000000000000000 RDX=0000000000000000
RSI=0000000000000001 RDI=00000000000001a0 RBP=ffffaace80013ca0 RSP=ffffaace80013c50
R8 =0000000000000004 R9 =0000000021bf694e R10=00000000aaaaaaab R11=0000000000000005
R12=0000000000000000 R13=0000000000000000 R14=0000000000000004 R15=ffff98fbc4ae2560
RIP=ffffffff948860e6 RFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0018 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
CS =0010 0000000000000000 ffffffff 00af9b00 DPL=0 CS64 [-RA]
SS =0018 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
DS =0018 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
FS =0018 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
GS =0018 0000000000000000 ffffffff 00cf9300 DPL=0 DS [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT= ffff98fbfec0b000 0000007f
IDT= ffffffff96604000 000001ff
CR0=80050033 CR2=0000000000032990 CR3=0000000001bfa000 CR4=000006f0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=0000000000000000 CCD=0000000000000000 CCO=SARQ
EFER=0000000000000d01
check_exception old: 0x8 new 0xd
Triple fault