[Question] int3 instruction generates a #UD in SEV VM

From: wuzongyong
Date: Fri Jul 28 2023 - 23:21:06 EST


Hi,
I am writing a firmware in Rust to support SEV based on project td-shim[1].
But when I create a SEV VM (just SEV, no SEV-ES and no SEV-SNP) with the firmware,
the linux kernel crashed because the int3 instruction in int3_selftest() cause a
#UD.
The stack is as follows:
    [    0.141804] invalid opcode: 0000 [#1] PREEMPT SMP^M
    [    0.141804] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.3.0+ #37^M
    [    0.141804] RIP: 0010:int3_selftest_ip+0x0/0x2a^M
    [    0.141804] Code: eb bc 66 90 0f 1f 44 00 00 48 83 ec 08 48 c7 c7 90 0d 78 83
c7 44 24 04 00 00 00 00 e8 23 fe ac fd 85 c0 75 22 48 8d 7c 24 04 <cc>
90 90 90 90 83 7c 24 04 01 75 13 48 c7 c7 90 0d 78 83 e8 42 fc^M
    [    0.141804] RSP: 0000:ffffffff82803f18 EFLAGS: 00010246^M
    [    0.141804] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000007ffffffe^M
    [    0.141804] RDX: ffffffff82fd4938 RSI: 0000000000000296 RDI: ffffffff82803f1c^M
    [    0.141804] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000fffeffff^M
    [    0.141804] R10: ffffffff82803e08 R11: ffffffff82f615a8 R12: 00000000ff062350^M
    [    0.141804] R13: 000000001fddc20a R14: 000000000090122c R15: 0000000002000000^M
    [    0.141804] FS:  0000000000000000(0000) GS:ffff88801f200000(0000) knlGS:0000000000000000^M
    [    0.141804] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
    [    0.141804] CR2: ffff888004c00000 CR3: 000800000281f000 CR4: 00000000003506f0^M
    [    0.141804] Call Trace:^M
    [    0.141804]  <TASK>^M
    [    0.141804]  alternative_instructions+0xe/0x100^M
    [    0.141804]  check_bugs+0xa7/0x110^M
    [    0.141804]  start_kernel+0x320/0x430^M
    [    0.141804]  secondary_startup_64_no_verify+0xd3/0xdb^M
    [    0.141804]  </TASK>^M
    [    0.141804] Modules linked in:^M
    [    0.141804] ---[ end trace 0000000000000000 ]--

Then I tried to figure out the problem and do some test with qemu & OVMF in SEV.
But the behaviour is also weird when I create SEV VM with qemu & OVMF.

I found the int3 instruction always generated a #UD if I put a int3 instruction before
gen_pool_create() in mce_gen_pool_create(). But if I put the int3 instruction after the
gen_pool_create() in mce_gen_pool_create(), the int3 instruction generated a #BP rightly.

    // linux/arch/x86/kernel/cpu/mce/genpool.c
    static int mce_gen_pool_create(void)
    {
        struct gen_pool *tmpp;
        int ret = -ENOMEM;
   
        asm volatile ("int3\n\t"); // generated a # UD
        tmpp = gen_pool_create(ilog2(sizeof(struct mce_evt_list)), -1);
        asm volatile ("int3\n\t"); // generated a #BP
        ...
    }

The stack is as follows when I put the int3 instruction before gen_pool_create().

    [    0.094846] invalid opcode: 0000 [#1] PREEMPT SMP^M
    [    0.094994] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.3.0+ #101^M
    [    0.094994] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022^M
    [    0.094994] RIP: 0010:mcheck_cpu_init+0x4e/0x150^M
    [    0.094994] Code: 84 c0 0f 89 97 00 00 00 48 8b 45 28 f6 c4 40 0f 84 8a 00 00 00 e8 c2
e6 ff ff 48 89 ef e8 8a e2 ff ff 85 c0 0f 88 94 00 00 00 <cc> e8 dc 05 00 00 85 c0 75 76
80 0d a1 90 0a 02 20 0f b6 45 01 3c^M
    [    0.094994] RSP: 0000:ffffffff92803ef8 EFLAGS: 00010246^M
    [    0.094994] RAX: 0000000000000000 RBX: 0000000000000058 RCX: 00000000ffffffff^M
    [    0.094994] RDX: 0000000000000002 RSI: 00000000000000ff RDI: ffffffff930ed860^M
    [    0.094994] RBP: ffffffff930ed860 R08: 0000000000000000 R09: 0000000000000000^M
    [    0.094994] R10: 0000000000000000 R11: 0000000000000254 R12: 0000000000000207^M
    [    0.094994] R13: 000000001f9ec018 R14: 000000001fe85928 R15: 0000000000000001^M
    [    0.094994] FS:  0000000000000000(0000) GS:ffff8ae0dca00000(0000) knlGS:0000000000000000^M
    [    0.094994] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
    [    0.094994] CR2: ffff8ae0dac01000 CR3: 000800001881f000 CR4: 00000000003506f0^M
    [    0.094994] Call Trace:^M
    [    0.094994]  <TASK>^M
    [    0.094994]  identify_cpu+0x2cb/0x500^M
    [    0.094994]  identify_boot_cpu+0x10/0xb0^M
    [    0.094994]  check_bugs+0xf/0x110^M
    [    0.094994]  start_kernel+0x320/0x430^M
    [    0.094994]  secondary_startup_64_no_verify+0xd3/0xdb^M
    [    0.094994]  </TASK>^M
    [    0.094994] Modules linked in:^M
    [    0.094995] ---[ end trace 0000000000000000 ]---^

The stack is as follows when I put the int3 instruction after gen_pool_create().
    [    0.095585] int3: 0000 [#1] PREEMPT SMP^M
    [    0.095590] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.3.0+ #101^M
    [    0.095593] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022^M
    [    0.095594] RIP: 0010:mcheck_cpu_init+0x4f/0x150^M
    [    0.095597] Code: c0 0f 89 97 00 00 00 48 8b 45 28 f6 c4 40 0f 84 8a 00 00 00 e8 c2 e6
ff ff 48 89 ef e8 8a e2 ff ff 85 c0 0f 88 94 00 00 00 cc <e8> dc 05 00 00 85 c0 75 76 80 
0d a1 90 0a 02 20 0f b6 45 01 3c 02^M
    [    0.095598] RSP: 0000:ffffffff86803ef8 EFLAGS: 00000246^M
    [    0.095599] RAX: 0000000000000000 RBX: 0000000000000058 RCX: 00000000ffffffff^M
    [    0.095600] RDX: 0000000000000002 RSI: 00000000000000ff RDI: ffffffff870ed860^M
    [    0.095601] RBP: ffffffff870ed860 R08: 0000000000000000 R09: 0000000000000000^M
    [    0.095601] R10: 0000000000000000 R11: 0000000000000254 R12: 0000000000000207^M
    [    0.095602] R13: 000000001f9ec018 R14: 000000001fe85928 R15: 0000000000000001^M
    [    0.095604] FS:  0000000000000000(0000) GS:ffff901e5ca00000(0000) knlGS:0000000000000000^M
    [    0.095605] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
    [    0.095606] CR2: ffff901e5ac01000 CR3: 000800001881f000 CR4: 00000000003506f0^M
    [    0.095606] Call Trace:^M
    [    0.095611]  <TASK>^M
    [    0.095612]  identify_cpu+0x2cb/0x500^M
    [    0.095615]  identify_boot_cpu+0x10/0xb0^M
    [    0.095618]  check_bugs+0xf/0x110^M
    [    0.095620]  start_kernel+0x320/0x430^M
    [    0.095622]  secondary_startup_64_no_verify+0xd3/0xdb^M
    [    0.095625]  </TASK>^M
    [    0.095625] Modules linked in:^M
    [    0.096577] ---[ end trace 0000000000000000 ]---^
BTW, if a create a normal VM without SEV by qemu & OVMF, the int3 instruction always generates a
#BP.
So I am confused now about the behaviour of int3 instruction, could anyone help to explain the behaviour?
Any suggestion is appreciated!

[1] https://github.com/confidential-containers/td-shim

Thanks,
Wu Zongyo