Re: kvm: GPF in kvm_lapic_latched_init

From: Dmitry Vyukov
Date: Fri Jan 15 2016 - 15:09:48 EST


On Fri, Jan 15, 2016 at 8:59 PM, Jeff Merkey <linux.mdb@xxxxxxxxx> wrote:
> On 1/8/16, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>> Hello,
>>
>> The following program triggers GPF in kvm_lapic_latched_init if run in
>> a parallel loop:
>> https://gist.githubusercontent.com/dvyukov/524b398f379440b21115/raw/9627095f57a72501fb51bf7565471d31732beeee/gistfile1.txt
>>
>> kasan: GPF could be caused by NULL-ptr deref or user memory
>> accessgeneral protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
>> Modules linked in:
>> CPU: 3 PID: 14426 Comm: a.out Not tainted 4.4.0-rc8+ #217
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
>> 01/01/2011
>> task: ffff880061099780 ti: ffff880062e30000 task.ti: ffff880062e30000
>> RIP: 0010:[<ffffffff81057171>] [<ffffffff81057171>]
>> kvm_arch_vcpu_ioctl+0xa31/0x2ef0
>> RSP: 0018:ffff880062e37900 EFLAGS: 00010206
>> RAX: dffffc0000000000 RBX: 1ffff1000c5c6f25 RCX: 1ffff1000c41b7cb
>> RDX: 000000000000001e RSI: 000000008040ae9f RDI: 00000000000000f0
>> RBP: ffff880062e37c10 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>> R13: 0000000000000000 R14: ffff880062e37be8 R15: 0000000000000000
>> FS: 00007f4aa815f700(0000) GS:ffff88006d700000(0000)
>> knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> CR2: 00007f4aa795de78 CR3: 00000000613c2000 CR4: 00000000000026e0
>> Stack:
>> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> 0000000020006fe4 0000000041b58ab3 ffffffff86e2e588 ffffffff81056740
>> 0000000000000001 ffff880061099f60 0000000000000498 ffff880061099f68
>> Call Trace:
>> [<ffffffff8101cb52>] kvm_vcpu_ioctl+0x1e2/0xd00
>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:2526
>> [< inline >] vfs_ioctl fs/ioctl.c:43
>> [<ffffffff817b36b1>] do_vfs_ioctl+0x681/0xe40 fs/ioctl.c:607
>> [< inline >] SYSC_ioctl fs/ioctl.c:622
>> [<ffffffff817b3eff>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:613
>> [<ffffffff85e745b6>] entry_SYSCALL_64_fastpath+0x16/0x7a
>> arch/x86/entry/entry_64.S:185
>> Code: 85 2d 20 00 00 4d 8b a4 24 60 03 00 00 e8 c8 8b 50 00 49 8d bc
>> 24 f0 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80>
>> 3c 02 00 0f 85 f3 1f 00 00 4d 8b a4 24 f0 00 00 00 41 83 e4
>> RIP [< inline >] constant_test_bit
>> ./arch/x86/include/asm/bitops.h:311
>> RIP [< inline >] kvm_lapic_latched_init arch/x86/kvm/lapic.h:164
>> RIP [< inline >] kvm_vcpu_ioctl_x86_get_vcpu_events
>> arch/x86/kvm/x86.c:2936
>> RIP [<ffffffff81057171>] kvm_arch_vcpu_ioctl+0xa31/0x2ef0
>> arch/x86/kvm/x86.c:3347
>> RSP <ffff880062e37900>
>> ---[ end trace 16449377928e034b ]---
>>
>>
>> or:
>>
>> kasan: GPF could be caused by NULL-ptr deref or user memory
>> accessgeneral protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
>> Modules linked in:
>> CPU: 0 PID: 9555 Comm: syz-executor Not tainted 4.4.0-rc8+ #217
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
>> 01/01/2011
>> task: ffff88006301de00 ti: ffff880062568000 task.ti: ffff880062568000
>> RIP: 0010:[<ffffffff810cf5ab>] [<ffffffff810cf5ab>]
>> wait_lapic_expire+0x6b/0x560
>> RSP: 0018:ffff88006256fa48 EFLAGS: 00010006
>> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffff88006301e5c8
>> RDX: 0000000000000011 RSI: 0000000000000000 RDI: ffff880033590360
>> RBP: ffff88006256fa88 R08: 0000000000000001 R09: 0000000000000002
>> R10: 0000000000000001 R11: 0000000000000001 R12: ffff880033590000
>> R13: ffff880033590030 R14: 0000000000000088 R15: ffff88003359002c
>> FS: 00007f4809354700(0000) GS:ffff88003ec00000(0000)
>> knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007f4808b53000 CR3: 0000000033f3f000 CR4: 00000000000026f0
>> Stack:
>> ffff88006256fa70 0000000000000082 0000000000000003 ffff88006301de00
>> ffff880033590030 ffff880033590030 ffff880033590000 ffff88003359002c
>> ffff88006256fc10 ffffffff8106a1dc ffffffff8106a75b 0000000000013210
>> Call Trace:
>> [< inline >] vcpu_enter_guest arch/x86/kvm/x86.c:6523
>> [< inline >] vcpu_run arch/x86/kvm/x86.c:6660
>> [<ffffffff8106a1dc>] kvm_arch_vcpu_ioctl_run+0x25ec/0x5820
>> arch/x86/kvm/x86.c:6818
>> [<ffffffff8101cf61>] kvm_vcpu_ioctl+0x5f1/0xd00
>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:2375
>> [< inline >] vfs_ioctl fs/ioctl.c:43
>> [<ffffffff817b36b1>] do_vfs_ioctl+0x681/0xe40 fs/ioctl.c:607
>> [< inline >] SYSC_ioctl fs/ioctl.c:622
>> [<ffffffff817b3eff>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:613
>> [<ffffffff85e745b6>] entry_SYSCALL_64_fastpath+0x16/0x7a
>> arch/x86/entry/entry_64.S:185
>> Code: 60 03 00 00 0f 1f 44 00 00 e8 92 07 49 00 4c 8d b3 88 00 00 00
>> e8 86 07 49 00 4c 89 f2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80>
>> 3c 02 00 0f 85 d8 04 00 00 4c 8b ab 88 00 00 00 4d 85 ed 75
>> RIP [<ffffffff810cf5ab>] wait_lapic_expire+0x6b/0x560
>> arch/x86/kvm/lapic.c:1245
>> RSP <ffff88006256fa48>
>> ---[ end trace 560c2b85e36670bc ]---
>>
>> or:
>>
>> kasan: GPF could be caused by NULL-ptr deref or user memory
>> accessgeneral protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
>> Modules linked in:
>> CPU: 3 PID: 11264 Comm: syz-executor Not tainted 4.4.0-rc8+ #217
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
>> 01/01/2011
>> task: ffff880064d55e00 ti: ffff880064dc0000 task.ti: ffff880064dc0000
>> RIP: 0010:[<ffffffff810d138d>] [<ffffffff810d138d>]
>> apic_has_pending_timer+0x7d/0x210
>> RSP: 0018:ffff880064dc7a60 EFLAGS: 00010206
>> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000004
>> RDX: 0000000000000017 RSI: 0000000000000000 RDI: 00000000000000b8
>> RBP: ffff880064dc7a70 R08: 0000000000000002 R09: 0000000000000001
>> R10: ffff880064d55e00 R11: ffff880063528220 R12: ffff880063250030
>> R13: ffff880063250030 R14: ffff880063250000 R15: 0000000000000000
>> FS: 00007fb05f305700(0000) GS:ffff88006d700000(0000)
>> knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00000000006d7760 CR3: 0000000065ae9000 CR4: 00000000000026e0
>> Stack:
>> ffff880063250000 ffff880063250030 ffff880064dc7a88 ffffffff810c7af5
>> ffffffff86fee5c0 ffff880064dc7c10 ffffffff810685d4 ffffffff8106a75b
>> 0000000000013210 ffff880065a35000 1ffff1000c9b8f59 ffff880064dc0008
>> Call Trace:
>> [<ffffffff810c7af5>] kvm_cpu_has_pending_timer+0x15/0x20
>> arch/x86/kvm/irq.c:36
>> [< inline >] vcpu_run arch/x86/kvm/x86.c:6669
>> [<ffffffff810685d4>] kvm_arch_vcpu_ioctl_run+0x9e4/0x5820
>> arch/x86/kvm/x86.c:6818
>> [<ffffffff8101cf61>] kvm_vcpu_ioctl+0x5f1/0xd00
>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:2375
>> [< inline >] vfs_ioctl fs/ioctl.c:43
>> [<ffffffff817b36b1>] do_vfs_ioctl+0x681/0xe40 fs/ioctl.c:607
>> [< inline >] SYSC_ioctl fs/ioctl.c:622
>> [<ffffffff817b3eff>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:613
>> [<ffffffff85e745b6>] entry_SYSCALL_64_fastpath+0x16/0x7a
>> arch/x86/entry/entry_64.S:185
>> Code: ba e9 48 00 0f 1f 44 00 00 e8 b0 e9 48 00 e8 ab e9 48 00 48 8d
>> bb b8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80>
>> 3c 02 00 0f 85 46 01 00 00 4c 8b a3 b8 00 00 00 48 b8 00 00
>> RIP [< inline >] arch_static_branch
>> ./arch/x86/include/asm/jump_label.h:21
>> RIP [< inline >] static_key_false include/linux/jump_label.h:133
>> RIP [< inline >] kvm_apic_hw_enabled arch/x86/kvm/lapic.h:117
>> RIP [< inline >] apic_enabled arch/x86/kvm/lapic.c:121
>> RIP [<ffffffff810d138d>] apic_has_pending_timer+0x7d/0x210
>> arch/x86/kvm/lapic.c:1731
>> RSP <ffff880064dc7a60>
>> ---[ end trace fe9c10b88e48c946 ]---
>>
>>
>> All crashes suggest that apic is NULL.
>>
>> On commit b06f3a168cdcd80026276898fd1fee443ef25743 (Jan 6).
>>
>
> Dmitry,
>
> You need to check your test harness and add checks for which CPL the
> kernel is running at for these GPF faults and add that to your report.
> I realize that there are a lot of kernel subsystems which are coded
> very loose on checking for this stuff. I have looked through some of
> these hangs you reported and I think one of them is related to a
> swapgs instruction getting nested, and two others related to code
> touching hardware.
>
> Can you figure out how to send the info as to what privilege level you
> are at when these faults occur? This one looks like swapgs got nested
> and gs was pointing off to oblivion.


The program opens /dev/kvm under root because it is mounted as 700.
But then do ioctl's under user nobody.
Does it make sense to add UID to kernel BUG/WARNING (at least
capable(CAP_SYS_ADMIN) flag)? Because it's a pretty generic concern
for all crashes.