Re: kvm: WARNING in mmu_spte_clear_track_bits

From: Dmitry Vyukov
Date: Thu Mar 23 2017 - 12:39:46 EST


On Tue, Mar 14, 2017 at 4:17 PM, Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx> wrote:
> 2017-03-12 12:20+0100, Dmitry Vyukov:
>> On Tue, Jan 17, 2017 at 5:00 PM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>>> On Tue, Jan 17, 2017 at 4:20 PM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>>>>
>>>>
>>>> On 13/01/2017 12:15, Dmitry Vyukov wrote:
>>>>>
>>>>> I've commented out the WARNING for now, but I am seeing lots of
>>>>> use-after-free's and rcu stalls involving mmu_spte_clear_track_bits:
>>>>>
>>>>>
>>>>> BUG: KASAN: use-after-free in mmu_spte_clear_track_bits+0x186/0x190
>>>>> arch/x86/kvm/mmu.c:597 at addr ffff880068ae2008
>>>>> Read of size 8 by task syz-executor2/16715
>>>>> page:ffffea00016e6170 count:0 mapcount:0 mapping: (null) index:0x0
>>>>> flags: 0x500000000000000()
>>>>> raw: 0500000000000000 0000000000000000 0000000000000000 00000000ffffffff
>>>>> raw: ffffea00017ec5a0 ffffea0001783d48 ffff88006aec5d98
>>>>> page dumped because: kasan: bad access detected
>>>>> CPU: 2 PID: 16715 Comm: syz-executor2 Not tainted 4.10.0-rc3+ #163
>>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>>>>> Call Trace:
>>>>> __dump_stack lib/dump_stack.c:15 [inline]
>>>>> dump_stack+0x292/0x3a2 lib/dump_stack.c:51
>>>>> kasan_report_error mm/kasan/report.c:213 [inline]
>>>>> kasan_report+0x42d/0x460 mm/kasan/report.c:307
>>>>> __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:333
>>>>> mmu_spte_clear_track_bits+0x186/0x190 arch/x86/kvm/mmu.c:597
>>>>> drop_spte+0x24/0x280 arch/x86/kvm/mmu.c:1182
>>>>> kvm_zap_rmapp+0x119/0x260 arch/x86/kvm/mmu.c:1401
>>>>> kvm_unmap_rmapp+0x1d/0x30 arch/x86/kvm/mmu.c:1412
>>>>> kvm_handle_hva_range+0x54a/0x7d0 arch/x86/kvm/mmu.c:1565
>>>>> kvm_unmap_hva_range+0x2e/0x40 arch/x86/kvm/mmu.c:1591
>>>>> kvm_mmu_notifier_invalidate_range_start+0xae/0x140
>>>>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:360
>>>>> __mmu_notifier_invalidate_range_start+0x1f8/0x300 mm/mmu_notifier.c:199
>>>>> mmu_notifier_invalidate_range_start include/linux/mmu_notifier.h:282 [inline]
>>>>> unmap_vmas+0x14b/0x1b0 mm/memory.c:1368
>>>>> unmap_region+0x2f8/0x560 mm/mmap.c:2460
>>>>> do_munmap+0x7b8/0xfa0 mm/mmap.c:2657
>>>>> mmap_region+0x68f/0x18e0 mm/mmap.c:1612
>>>>> do_mmap+0x6a2/0xd40 mm/mmap.c:1450
>>>>> do_mmap_pgoff include/linux/mm.h:2031 [inline]
>>>>> vm_mmap_pgoff+0x1a9/0x200 mm/util.c:305
>>>>> SYSC_mmap_pgoff mm/mmap.c:1500 [inline]
>>>>> SyS_mmap_pgoff+0x22c/0x5d0 mm/mmap.c:1458
>>>>> SYSC_mmap arch/x86/kernel/sys_x86_64.c:95 [inline]
>>>>> SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:86
>>>>> entry_SYSCALL_64_fastpath+0x1f/0xc2
>>>>> RIP: 0033:0x445329
>>>>> RSP: 002b:00007fb33933cb58 EFLAGS: 00000282 ORIG_RAX: 0000000000000009
>>>>> RAX: ffffffffffffffda RBX: 0000000020000000 RCX: 0000000000445329
>>>>> RDX: 0000000000000003 RSI: 0000000000af1000 RDI: 0000000020000000
>>>>> RBP: 00000000006dfe90 R08: ffffffffffffffff R09: 0000000000000000
>>>>> R10: 0000000000000032 R11: 0000000000000282 R12: 0000000000700000
>>>>> R13: 0000000000000006 R14: ffffffffffffffff R15: 0000000020001000
>>>>> Memory state around the buggy address:
>>>>> ffff880068ae1f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>> ffff880068ae1f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>>> ffff880068ae2000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>> ^
>>>>> ffff880068ae2080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>> ffff880068ae2100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>> ==================================================================
>>>>
>>>> This could be related to the gfn_to_rmap issues.
>>>
>>>
>>> Humm... That's possible. Potentially I am not seeing any more of
>>> spte-related crashes after I applied the following patch:
>>>
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -968,8 +968,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
>>> /* Check for overlaps */
>>> r = -EEXIST;
>>> kvm_for_each_memslot(slot, __kvm_memslots(kvm, as_id)) {
>>> - if ((slot->id >= KVM_USER_MEM_SLOTS) ||
>>> - (slot->id == id))
>>> + if (slot->id == id)
>>> continue;
>>> if (!((base_gfn + npages <= slot->base_gfn) ||
>>> (base_gfn >= slot->base_gfn + slot->npages)))
>
> I don't understand how this fixes the test: the only memslot that the
> test creates is at memory range 0x0-0x1000, which should not overlap
> with any private memslots.
> There should be just the IDENTITY_PAGETABLE_PRIVATE_MEMSLOT @
> 0xfffbc000ul.
>
> Do you get any ouput with this hunk?
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index a17d78759727..7e1929432232 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -888,6 +888,14 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
> return old_memslots;
> }
>
> +void kvm_dump_slot(struct kvm_memory_slot *slot)
> +{
> + printk("kvm_memory_slot %p { .id = %u, .base_gfn = %#llx, .npages = %lu, "
> + ".userspace_addr = %#lx, .flags = %u, .dirty_bitmap = %p, .arch = ? }\n",
> + slot, slot->id, slot->base_gfn, slot->npages,
> + slot->userspace_addr, slot->flags, slot->dirty_bitmap);
> +}
> +
> /*
> * Allocate some memory and give it an address in the guest physical address
> * space.
> @@ -978,12 +986,14 @@ int __kvm_set_memory_region(struct kvm *kvm,
> /* Check for overlaps */
> r = -EEXIST;
> kvm_for_each_memslot(slot, __kvm_memslots(kvm, as_id)) {
> - if ((slot->id >= KVM_USER_MEM_SLOTS) ||
> - (slot->id == id))
> + if (slot->id == id)
> continue;
> if (!((base_gfn + npages <= slot->base_gfn) ||
> - (base_gfn >= slot->base_gfn + slot->npages)))
> + (base_gfn >= slot->base_gfn + slot->npages))) {
> + kvm_dump_slot(&new);
> + kvm_dump_slot(slot);
> goto out;
> + }
> }
> }
>
>
>> Friendly ping. Just hit it on
>
> And the warning happens at mmap ... I can't reproduce, but does the bug
> happen on the second mmap()? (Test line 210 when i = 0.)
>
> The change above makes sense as memslots currently cannot overlap
> anywhere. There are three private memslots that can cause this problem:
> TSS, IDENTITY_MAP and APIC.
>
> TSS and IDENTITY_MAP can be configured by userspace and must not
> conflict by design, so we can safely enforce that.
> APIC memslot doesn't provide such guarantees and should be overlaid over
> any memory, but assuming that userspace doesn't configure memslots there
> seems bearable.
>
> Still, I'd like to understand why that patch would fix this bug.
>
> Thanks.


Humm... I cannot reproduce it anymore. Maybe it was fixed by something else...
However this looks very close and is still not fixed:
https://groups.google.com/d/msg/syzkaller/IqkesiRS-t0/aLcJuMXqBgAJ
Maybe it's another reincarnation of the same problem...




>> mmotm/86292b33d4b79ee03e2f43ea0381ef85f077c760 (without the above
>> change):
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 1 PID: 31060 at arch/x86/kvm/mmu.c:682
>> mmu_spte_clear_track_bits+0x3a1/0x420 arch/x86/kvm/mmu.c:682
>> CPU: 1 PID: 31060 Comm: syz-executor0 Not tainted 4.11.0-rc1+ #328
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> Call Trace:
>> __dump_stack lib/dump_stack.c:16 [inline]
>> dump_stack+0x1a7/0x26a lib/dump_stack.c:52
>> panic+0x1f8/0x40f kernel/panic.c:180
>> __warn+0x1c4/0x1e0 kernel/panic.c:541
>> warn_slowpath_null+0x2c/0x40 kernel/panic.c:584
>> mmu_spte_clear_track_bits+0x3a1/0x420 arch/x86/kvm/mmu.c:682
>> drop_spte+0x24/0x280 arch/x86/kvm/mmu.c:1323
>> mmu_page_zap_pte+0x223/0x350 arch/x86/kvm/mmu.c:2438
>> kvm_mmu_page_unlink_children arch/x86/kvm/mmu.c:2460 [inline]
>> kvm_mmu_prepare_zap_page+0x1ce/0x13d0 arch/x86/kvm/mmu.c:2504
>> kvm_zap_obsolete_pages arch/x86/kvm/mmu.c:5134 [inline]
>> kvm_mmu_invalidate_zap_all_pages+0x4d4/0x6b0 arch/x86/kvm/mmu.c:5175
>> kvm_arch_flush_shadow_all+0x15/0x20 arch/x86/kvm/x86.c:8364
>> kvm_mmu_notifier_release+0x71/0xb0
>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:472
>> __mmu_notifier_release+0x1e5/0x6b0 mm/mmu_notifier.c:75
>> mmu_notifier_release include/linux/mmu_notifier.h:235 [inline]
>> exit_mmap+0x3a3/0x470 mm/mmap.c:2941
>> __mmput kernel/fork.c:890 [inline]
>> mmput+0x228/0x700 kernel/fork.c:912
>> exit_mm kernel/exit.c:558 [inline]
>> do_exit+0x9e8/0x1c20 kernel/exit.c:866
>> do_group_exit+0x149/0x400 kernel/exit.c:983
>> get_signal+0x6d9/0x1840 kernel/signal.c:2318
>> do_signal+0x94/0x1f30 arch/x86/kernel/signal.c:808
>> exit_to_usermode_loop+0x1e5/0x2d0 arch/x86/entry/common.c:157
>> prepare_exit_to_usermode arch/x86/entry/common.c:191 [inline]
>> syscall_return_slowpath+0x3bd/0x460 arch/x86/entry/common.c:260
>> entry_SYSCALL_64_fastpath+0xc0/0xc2
>> RIP: 0033:0x4458d9
>> RSP: 002b:00007ffa472c3b58 EFLAGS: 00000286 ORIG_RAX: 00000000000000ce
>> RAX: fffffffffffffff4 RBX: 0000000000708000 RCX: 00000000004458d9
>> RDX: 0000000000000000 RSI: 000000002006bff8 RDI: 000000000000a05b
>> RBP: 0000000000000fe0 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000286 R12: 00000000006df0a0
>> R13: 000000000000a05b R14: 000000002006bff8 R15: 0000000000000000