RE: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()

From: Michael Kelley
Date: Thu Aug 05 2021 - 14:08:23 EST


From: David Mozes <david.mozes@xxxxxxx>
>
> Hi,
> The problem is happening to me very frequently on kernel 4.19.195
>

David -- could you give us a little more context? Were you running earlier
4.19.xxx versions and did not see this problem? There was a timing
problem in hyperv_flush_tlb_others() that was fixed in early January
2021. The fix was backported to the 4.19 longterm tree, and should
be included in 4.19.195. Outside of that, I'm not aware of a problem
in this area.

For completeness, what version of Hyper-V are you using? And how
many vCPUs in your VM?

Michael

>
>
> ug 4 03:59:01 c-node04 kernel: [36976.388554] BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others+0xec9/0x1640
> Aug 4 03:59:01 c-node04 kernel: [36976.388556] Read of size 4 at addr ffff889e5e127440 by task ps/52478
> Aug 4 03:59:01 c-node04 kernel: [36976.388556]
> Aug 4 03:59:01 c-node04 kernel: [36976.388560] CPU: 4 PID: 52478 Comm: ps Kdump: loaded Tainted: G W OE
> 4.19.195-KM9 #1
> Aug 4 03:59:01 c-node04 kernel: [36976.388562] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine,
> BIOS 090008 12/07/2018
> Aug 4 03:59:01 c-node04 kernel: [36976.388562] Call Trace:
> Aug 4 03:59:01 c-node04 kernel: [36976.388569] dump_stack+0x11d/0x1a7
> Aug 4 03:59:01 c-node04 kernel: [36976.388572] ? dump_stack_print_info.cold.0+0x1b/0x1b
> Aug 4 03:59:01 c-node04 kernel: [36976.388576] ? percpu_ref_tryget_live+0x2f0/0x2f0
> Aug 4 03:59:01 c-node04 kernel: [36976.388580] ? rb_erase_cached+0xc4c/0x2880
> Aug 4 03:59:01 c-node04 kernel: [36976.388584] ? printk+0x9f/0xc5
> Aug 4 03:59:01 c-node04 kernel: [36976.388585] ? snapshot_ioctl.cold.1+0x74/0x74
> Aug 4 03:59:01 c-node04 kernel: [36976.388590] print_address_description+0x65/0x22e
> Aug 4 03:59:01 c-node04 kernel: [36976.388592] kasan_report.cold.6+0x243/0x2ff
> Aug 4 03:59:01 c-node04 kernel: [36976.388594] ? hyperv_flush_tlb_others+0xec9/0x1640
> Aug 4 03:59:01 c-node04 kernel: [36976.388596] hyperv_flush_tlb_others+0xec9/0x1640
> Aug 4 03:59:01 c-node04 kernel: [36976.388601] ?
> trace_event_raw_event_hyperv_nested_flush_guest_mapping+0x1b0/0x1b0
> Aug 4 03:59:01 c-node04 kernel: [36976.388603] ? mem_cgroup_try_charge+0x3cc/0x7d0
> Aug 4 03:59:01 c-node04 kernel: [36976.388608] flush_tlb_mm_range+0x25c/0x370
> Aug 4 03:59:01 c-node04 kernel: [36976.388611] ? native_flush_tlb_others+0x3b0/0x3b0
> Aug 4 03:59:01 c-node04 kernel: [36976.388616] ptep_clear_flush+0x192/0x1d0
> Aug 4 03:59:01 c-node04 kernel: [36976.388618] ? pmd_clear_bad+0x70/0x70
> Aug 4 03:59:01 c-node04 kernel: [36976.388622] wp_page_copy+0x861/0x1a30
> Aug 4 03:59:01 c-node04 kernel: [36976.388624] ? follow_pfn+0x2f0/0x2f0
> Aug 4 03:59:01 c-node04 kernel: [36976.388627] ? active_load_balance_cpu_stop+0x10d0/0x10d0
> Aug 4 03:59:01 c-node04 kernel: [36976.388632] ? get_page_from_freelist+0x330c/0x4660
> Aug 4 03:59:01 c-node04 kernel: [36976.388638] ? activate_page+0x660/0x660
> Aug 4 03:59:01 c-node04 kernel: [36976.388639] ? rb_erase+0x2a40/0x2a40
> Aug 4 03:59:01 c-node04 kernel: [36976.388639] ? wake_up_page_bit+0x4d0/0x4d0
> Aug 4 03:59:01 c-node04 kernel: [36976.388639] ? unwind_next_frame+0x113e/0x1920
> Aug 4 03:59:01 c-node04 kernel: [36976.388639] ? __pte_alloc_kernel+0x350/0x350
> Aug 4 03:59:01 c-node04 kernel: [36976.388639] ? deref_stack_reg+0x130/0x130
> Aug 4 03:59:01 c-node04 kernel: [36976.388639] do_wp_page+0x461/0x1ca0
> Aug 4 03:59:01 c-node04 kernel: [36976.388639] ? deref_stack_reg+0x130/0x130
> Aug 4 03:59:01 c-node04 kernel: [36976.388639] ? finish_mkwrite_fault+0x710/0x710
> Aug 4 03:59:01 c-node04 kernel: [36976.388639] ? unwind_next_frame+0x105d/0x1920
> Aug 4 03:59:01 c-node04 kernel: [36976.388639] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
> Aug 4 03:59:01 c-node04 kernel: [36976.388639] ? __pte_alloc_kernel+0x350/0x350
> Aug 4 03:59:01 c-node04 kernel: [36976.388639] ? __zone_watermark_ok+0x33c/0x640
> Aug 4 03:59:01 c-node04 kernel: [36976.388639] ? _raw_spin_lock+0x13/0x30
> Pattern not found (press RETURN)