Re: mm: hangs in collapse_huge_page

From: Kirill A. Shutemov
Date: Thu Apr 30 2015 - 18:24:44 EST


On Thu, Apr 30, 2015 at 06:17:34PM -0400, Sasha Levin wrote:
> On 04/30/2014 11:42 AM, Kirill A. Shutemov wrote:
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index b4b1feba6472..1c6ace5207b9 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -1986,6 +1986,8 @@ static void insert_to_mm_slots_hash(struct mm_struct *mm,
> >
> > static inline int khugepaged_test_exit(struct mm_struct *mm)
> > {
> > + VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem) &&
> > + !spin_is_locked(&khugepaged_mm_lock));
> > return atomic_read(&mm->mm_users) == 0;
> > }
>
> I've managed to hit this during testing:
>
> [ 8048.304275] kernel BUG at mm/huge_memory.c:2060!
> [ 8048.305878] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
> [ 8048.307479] Modules linked in: quota_v2 quota_tree xfs libcrc32c x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ast kvm ttm drm_kms_helper crct10dif_pclmul crc32_pclmul drm ghash_clmulni_intel aesni_
> intel aes_x86_64 lrw glue_helper ablk_helper cryptd joydev i2c_algo_bit sb_edac syscopyarea sysfillrect edac_core sysimgblt lpc_ich ipmi_si ipmi_msghandler ioatdma shpchp mac_hid btrfs xor mlx4_en vxlan raid6_pq
> hid_generic ixgbe mlx4_core usbhid hid dca megaraid_sas ahci ptp libahci pps_core mdio
> [ 8048.314422] CPU: 31 PID: 13065 Comm: thp01 Not tainted 4.1.0-rc1-next-20150430+ #8
> [ 8048.316215] Hardware name: Oracle Corporation OVCA X3-2 /ASSY,MOTHERBOARD,1U , BIOS 17021300 06/19/2012
> [ 8048.318070] task: ffff8837ba9b3b40 ti: ffff8837bfcf8000 task.ti: ffff8837bfcf8000
> [ 8048.319941] RIP: __khugepaged_enter (mm/huge_memory.c:2059 mm/huge_memory.c:2075)
> [ 8048.321856] RSP: 0018:ffff8837bfcff8a0 EFLAGS: 00010246
> [ 8048.323752] RAX: 000000000000d800 RBX: ffff8837b8314b00 RCX: 0000000000000000
> [ 8048.325665] RDX: 00000000000000d8 RSI: 00000000000000fc RDI: ffff8837b8314ba8
> [ 8048.327570] RBP: ffff8837bfcff8e0 R08: ffff8837df1e5040 R09: ffffed06f4b701b8
> [ 8048.329486] R10: 000000002a82d01f R11: 1ffff106f82c0f77 R12: ffff8837a5b80d98
> [ 8048.331414] R13: ffff8837c6c58b80 R14: ffff8837c6c58bd0 R15: 0000000000000000
> [ 8048.333357] FS: 00007f238e593740(0000) GS:ffff8837df1c0000(0000) knlGS:0000000000000000
> [ 8048.335329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 8048.337304] CR2: 00007f8a8d2c4740 CR3: 00000037c7c10000 CR4: 00000000000407e0
> [ 8048.339369] Stack:
> [ 8048.341343] 0000000000000000 00000007fffffffe ffff883700000001 ffff8837b8314b00
> [ 8048.343382] 00007fffffc00000 ffff8837c6c58b80 ffff8837c6c58bd0 0000000000000000
> [ 8048.345421] ffff8837bfcff910 ffffffff815cfa60 ffff8837bfcff910 ffffffff81249cd3
> [ 8048.347473] Call Trace:
> [ 8048.349502] khugepaged_enter_vma_merge (include/linux/khugepaged.h:46 mm/huge_memory.c:2115)
> [ 8048.351584] ? up_write (kernel/locking/rwsem.h:9 kernel/locking/rwsem.c:93)
> [ 8048.353654] expand_downwards (mm/mmap.c:2278)
> [ 8048.355719] ? __mem_cgroup_count_vm_event (mm/memcontrol.c:1156)
> [ 8048.357791] handle_mm_fault (mm/memory.c:2673 mm/memory.c:3250 mm/memory.c:3371 mm/memory.c:3400)
> [ 8048.359886] ? follow_page_pte (mm/gup.c:48)
> [ 8048.361952] ? __pmd_alloc (mm/memory.c:3382)
> [ 8048.364020] ? _raw_spin_unlock (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:154 kernel/locking/spinlock.c:183)
> [ 8048.366083] ? follow_page_pte (mm/gup.c:125)
> [ 8048.368139] ? follow_page_mask (mm/gup.c:209)
> [ 8048.370181] __get_user_pages (mm/gup.c:285 mm/gup.c:477)
> [ 8048.372214] ? follow_page_mask (mm/gup.c:420)
> [ 8048.374242] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3762)
> [ 8048.376269] get_user_pages (mm/gup.c:818)

We call get_user_pages() here without ->mmap_sem taken. It violates
get_user_pages() interface but should not cause a problem because we don't
have concurency for the mm yet -- it's exec path.

Not sure if we should correct it.

Hm. __bprm_mm_init() in the same exec path takes ->mmap_sem.

Any comments?

> [ 8048.378295] copy_strings.isra.20 (fs/exec.c:197 fs/exec.c:510)
> [ 8048.380392] ? count.isra.18.constprop.36 (fs/exec.c:454)
> [ 8048.382439] ? copy_strings_kernel (fs/exec.c:556)
> [ 8048.384464] do_execveat_common.isra.32 (fs/exec.c:1577)
> [ 8048.386469] ? do_execveat_common.isra.32 (include/linux/spinlock.h:312 fs/exec.c:1263 fs/exec.c:1518)
> [ 8048.388448] ? prepare_bprm_creds (fs/exec.c:1475)
> [ 8048.390395] ? kmem_cache_alloc (include/trace/events/kmem.h:53 mm/slub.c:2524)
> [ 8048.392309] ? getname_flags (fs/namei.c:135)
> [ 8048.394187] ? up_read (./arch/x86/include/asm/rwsem.h:156 kernel/locking/rwsem.c:81)
> [ 8048.396027] ? getname_flags (fs/namei.c:146)
> [ 8048.397869] SyS_execve (fs/exec.c:1701)
> [ 8048.399715] stub_execve (arch/x86/kernel/entry_64.S:510)
> [ 8048.401482] ? system_call_fastpath (arch/x86/kernel/entry_64.S:261)
> [ 8048.403207] Code: 1f 84 00 00 00 00 00 b8 f4 ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f b7 05 a9 fb db 01 0f b6 d4 31 d0 a8 fe 0f 85 3e fe ff ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 48 89 df e8 18 88 f6 ff 0f
> All code
> ========
> 0: 1f (bad)
> 1: 84 00 test %al,(%rax)
> 3: 00 00 add %al,(%rax)
> 5: 00 00 add %al,(%rax)
> 7: b8 f4 ff ff ff mov $0xfffffff4,%eax
> c: c3 retq
> d: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
> 14: 00 00 00
> 17: 0f b7 05 a9 fb db 01 movzwl 0x1dbfba9(%rip),%eax # 0x1dbfbc7
> 1e: 0f b6 d4 movzbl %ah,%edx
> 21: 31 d0 xor %edx,%eax
> 23: a8 fe test $0xfe,%al
> 25: 0f 85 3e fe ff ff jne 0xfffffffffffffe69
> 2b:* 0f 0b ud2 <-- trapping instruction
> 2d: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
> 34: 00 00 00
> 37: 48 89 df mov %rbx,%rdi
> 3a: e8 18 88 f6 ff callq 0xfffffffffff68857
> 3f:
>
> Code starting with the faulting instruction
> ===========================================
> 0: 0f 0b ud2
> 2: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
> 9: 00 00 00
> c: 48 89 df mov %rbx,%rdi
> f: e8 18 88 f6 ff callq 0xfffffffffff6882c
> 14:
> [ 8048.406837] RIP __khugepaged_enter (mm/huge_memory.c:2059 mm/huge_memory.c:2075)
> [ 8048.408525] RSP <ffff8837bfcff8a0>
>
>
> Thanks,
> Sasha
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/