cgroup and FALLOC_FL_PUNCH_HOLE: WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5

From: David Hildenbrand
Date: Wed Oct 14 2020 - 11:23:17 EST


Hi everybody,

Michal Privoznik played with "free page reporting" in QEMU/virtio-balloon
with hugetlbfs and reported that this results in [1]

1. WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5

2. Any hugetlbfs allocations failing. (I assume because some accounting is wrong)


QEMU with free page hinting uses fallocate(FALLOC_FL_PUNCH_HOLE)
to discard pages that are reported as free by a VM. The reporting
granularity is in pageblock granularity. So when the guest reports
2M chunks, we fallocate(FALLOC_FL_PUNCH_HOLE) one huge page in QEMU.

I was also able to reproduce (also with virtio-mem, which similarly
uses fallocate(FALLOC_FL_PUNCH_HOLE)) on latest v5.9
(and on v5.7.X from F32).

Looks like something with fallocate(FALLOC_FL_PUNCH_HOLE) accounting
is broken with cgroups. I did *not* try without cgroups yet.

Any ideas?


Here is report #1:

[ 315.251417] ------------[ cut here ]------------
[ 315.251424] WARNING: CPU: 7 PID: 6636 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x50
[ 315.251425] Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp rfcomm tun bridge stp llc nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter cmac bnep hwmon_vid sunrpc squashfs vfat fat loop snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec edac_mce_amd snd_hda_core btusb btrtl btbcm snd_hwdep snd_seq btintel kvm_amd snd_seq_device bluetooth kvm snd_pcm ecdh_generic sp5100_tco irqbypass rfkill snd_timer rapl ecc pcspkr wmi_bmof joydev i2c_piix4 k10temp snd
[ 315.251454] soundcore acpi_cpufreq ip_tables xfs libcrc32c dm_crypt igb hid_logitech_hidpp video dca amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel mxm_wmi drm ghash_clmulni_intel ccp nvme nvme_core wmi pinctrl_amd hid_logitech_dj fuse
[ 315.251466] CPU: 7 PID: 6636 Comm: qemu-system-x86 Not tainted 5.9.0 #137
[ 315.251467] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO, BIOS F21 07/31/2020
[ 315.251469] RIP: 0010:page_counter_uncharge+0x4b/0x50
[ 315.251471] Code: 0f c1 45 00 4c 29 e0 48 89 ef 48 89 c3 48 89 c6 e8 2a fe ff ff 48 85 db 78 10 48 8b 6d 28 48 85 ed 75 d8 5b 5d 41 5c 41 5d c3 <0f> 0b eb ec 90 0f 1f 44 00 00 48 8b 17 48 39 d6 72 41 41 54 49 89
[ 315.251472] RSP: 0018:ffffb60f01ed3b20 EFLAGS: 00010286
[ 315.251473] RAX: fffffffffffd0600 RBX: fffffffffffd0600 RCX: ffff8de8272e3200
[ 315.251473] RDX: 000000000000028e RSI: fffffffffffd0600 RDI: ffff8de838452e40
[ 315.251474] RBP: ffff8de838452e40 R08: ffff8de838452e40 R09: ffff8de837f86c80
[ 315.251475] R10: ffffb60f01ed3b58 R11: 0000000000000001 R12: 0000000000051c00
[ 315.251475] R13: fffffffffffae400 R14: ffff8de8272e3240 R15: 0000000000000571
[ 315.251476] FS: 00007f9c2edfd700(0000) GS:ffff8de83ebc0000(0000) knlGS:0000000000000000
[ 315.251477] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 315.251478] CR2: 00007f2a76787e78 CR3: 0000000fcbb1c000 CR4: 0000000000350ee0
[ 315.251479] Call Trace:
[ 315.251485] hugetlb_cgroup_uncharge_file_region+0x4b/0x80
[ 315.251487] region_del+0x1d3/0x300
[ 315.251489] hugetlb_unreserve_pages+0x39/0xb0
[ 315.251492] remove_inode_hugepages+0x1a8/0x3d0
[ 315.251495] ? tlb_finish_mmu+0x7a/0x1d0
[ 315.251497] hugetlbfs_fallocate+0x3c4/0x5c0
[ 315.251519] ? kvm_arch_vcpu_ioctl_run+0x614/0x1700 [kvm]
[ 315.251522] ? file_has_perm+0xa2/0xb0
[ 315.251524] ? inode_security+0xc/0x60
[ 315.251525] ? selinux_file_permission+0x4e/0x120
[ 315.251527] vfs_fallocate+0x146/0x290
[ 315.251529] __x64_sys_fallocate+0x3e/0x70
[ 315.251531] do_syscall_64+0x33/0x40
[ 315.251533] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 315.251535] RIP: 0033:0x7f9d3fb5641f
[ 315.251536] Code: 89 7c 24 08 48 89 4c 24 18 e8 5d fc f8 ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 8b 74 24 0c 8b 7c 24 08 b8 1d 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 89 44 24 08 e8 8d fc f8 ff 8b 44
[ 315.251537] RSP: 002b:00007f9c2edfc470 EFLAGS: 00000293 ORIG_RAX: 000000000000011d
[ 315.251538] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007f9d3fb5641f
[ 315.251539] RDX: 00000000ae200000 RSI: 0000000000000003 RDI: 000000000000000c
[ 315.251539] RBP: 0000557389d6736c R08: 0000000000000000 R09: 000000000000000c
[ 315.251540] R10: 0000000000200000 R11: 0000000000000293 R12: 0000000000200000
[ 315.251540] R13: 00000000ffffffff R14: 00000000ae200000 R15: 00007f9cde000000
[ 315.251542] ---[ end trace 4c88c62ccb1349c9 ]---



Here is report #2:

[ 400.920702] ------------[ cut here ]------------
[ 400.920711] WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x50
[ 400.920712] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp rfcomm tun bridge stp llc nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter cmac bnep hwmon_vid sunrpc squashfs vfat fat loop btusb btrtl btbcm btintel edac_mce_amd bluetooth snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec kvm_amd snd_hda_core snd_hwdep kvm snd_seq ecdh_generic snd_seq_device rfkill irqbypass snd_pcm ecc joydev sp5100_tco rapl pcspkr
[ 400.920743] wmi_bmof i2c_piix4 k10temp snd_timer snd soundcore acpi_cpufreq ip_tables xfs libcrc32c dm_crypt igb hid_logitech_hidpp video dca amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel drm mxm_wmi ghash_clmulni_intel ccp nvme nvme_core wmi pinctrl_amd hid_logitech_dj fuse
[ 400.920759] CPU: 13 PID: 2438 Comm: qemu-system-x86 Not tainted 5.9.0 #137
[ 400.920760] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO, BIOS F21 07/31/2020
[ 400.920763] RIP: 0010:page_counter_uncharge+0x4b/0x50
[ 400.920765] Code: 0f c1 45 00 4c 29 e0 48 89 ef 48 89 c3 48 89 c6 e8 2a fe ff ff 48 85 db 78 10 48 8b 6d 28 48 85 ed 75 d8 5b 5d 41 5c 41 5d c3 <0f> 0b eb ec 90 0f 1f 44 00 00 48 8b 17 48 39 d6 72 41 41 54 49 89
[ 400.920766] RSP: 0018:ffffb89e01f5fa20 EFLAGS: 00010286
[ 400.920767] RAX: fffffffffff01200 RBX: fffffffffff01200 RCX: 0000000080400000
[ 400.920768] RDX: 0000000000000800 RSI: fffffffffff01200 RDI: ffff910b78452e40
[ 400.920769] RBP: ffff910b78452e40 R08: ffff910b78452e40 R09: ffff910b70b2a700
[ 400.920769] R10: 0000000000000001 R11: ffff910b5e079300 R12: 0000000000100000
[ 400.920770] R13: fffffffffff00000 R14: ffff910b76185908 R15: 0000000000000000
[ 400.920771] FS: 0000000000000000(0000) GS:ffff910b7ed40000(0000) knlGS:0000000000000000
[ 400.920772] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 400.920773] CR2: 00007f90b2d898bc CR3: 000000056ca0e000 CR4: 0000000000350ee0
[ 400.920774] Call Trace:
[ 400.920780] hugetlb_cgroup_uncharge_file_region+0x4b/0x80
[ 400.920783] region_del+0x11b/0x300
[ 400.920786] hugetlb_unreserve_pages+0x39/0xb0
[ 400.920788] remove_inode_hugepages+0x3c2/0x3d0
[ 400.920792] hugetlbfs_evict_inode+0x1a/0x40
[ 400.920795] evict+0xd1/0x1a0
[ 400.920797] __dentry_kill+0xe4/0x180
[ 400.920799] __fput+0xec/0x240
[ 400.920802] task_work_run+0x65/0xa0
[ 400.920804] do_exit+0x34c/0xad0
[ 400.920806] do_group_exit+0x33/0xa0
[ 400.920808] get_signal+0x179/0x8d0
[ 400.920811] arch_do_signal+0x30/0x700
[ 400.920832] ? kvm_vcpu_ioctl+0x29f/0x590 [kvm]
[ 400.920835] exit_to_user_mode_prepare+0xf7/0x160
[ 400.920838] syscall_exit_to_user_mode+0x31/0x1b0
[ 400.920841] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 400.920843] RIP: 0033:0x7f90b3008e92
[ 400.920843] Code: Bad RIP value.
[ 400.920844] RSP: 002b:00007f8d45ffa770 EFLAGS: 00000282 ORIG_RAX: 00000000000000ca
[ 400.920845] RAX: fffffffffffffe00 RBX: 0000000000000014 RCX: 00007f90b3008e92
[ 400.920846] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 000055efd2951db8
[ 400.920846] RBP: 000055efd2951d90 R08: 0000000000000000 R09: 000055efd18f29a0
[ 400.920847] R10: 0000000000000000 R11: 0000000000000282 R12: 0000000000000000
[ 400.920848] R13: 000055efd190ff60 R14: 000055efd2951db8 R15: 00007f8d45ffa7a0
[ 400.920850] ---[ end trace bd4d1b0930afe999 ]---



[1] https://www.redhat.com/archives/libvir-list/2020-October/msg00872.html

--
Thanks,

David / dhildenb