Re: 3.15.0-rc6: VM_BUG_ON_PAGE(PageTail(page), page)

From: Vlastimil Babka
Date: Thu May 22 2014 - 11:08:17 EST


On 05/22/2014 03:58 PM, Dave Jones wrote:
Not sure if Sasha has already reported this on -next (It's getting hard
to keep track of all the VM bugs he's been finding), but I hit this overnight
on .15-rc6. First time I've seen this one.


page:ffffea0004599800 count:0 mapcount:0 mapping: (null) index:0x2
page flags: 0x20000000008000(tail)
------------[ cut here ]------------
kernel BUG at include/linux/page-flags.h:415!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU: 1 PID: 6858 Comm: trinity-c42 Not tainted 3.15.0-rc6+ #216
task: ffff88012d18e900 ti: ffff88009e87a000 task.ti: ffff88009e87a000
RIP: 0010:[<ffffffffbb718d98>] [<ffffffffbb718d98>] PageTransHuge.part.23+0xb/0xd
RSP: 0000:ffff88009e87b940 EFLAGS: 00010246
RAX: 0000000000000001 RBX: 0000000000116660 RCX: 0000000000000006
RDX: 0000000000000000 RSI: ffffffffbb0c00f8 RDI: ffffffffbb0bfed2
RBP: ffff88009e87b940 R08: ffffffffbc01203c R09: 00000000000003da
R10: 00000000000003d9 R11: 0000000000000003 R12: 0000000000000001
R13: 0000000000116800 R14: ffff88024d64ce00 R15: ffffea0004599800
FS: 00007f4fd192e740(0000) GS:ffff88024d040000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000004c00000 CR3: 00000000a19ce000 CR4: 00000000001407e0
DR0: 00000000024f4000 DR1: 0000000001d43000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
ffff88009e87b9e8 ffffffffbb1728a3 ffff88009e87b9e8 ffff88009e87baa8
ffff88012d18e900 ffff88009e87ba60 0000000000000000 0000000400000016
0000000000000000 ffff88009e87bfd8 00000000000008b3 ffff88009e87ba50
Call Trace:
[<ffffffffbb1728a3>] isolate_migratepages_range+0x7a3/0x870
[<ffffffffbb172d90>] compact_zone+0x370/0x560
[<ffffffffbb173022>] compact_zone_order+0xa2/0x110
[<ffffffffbb1733f1>] try_to_compact_pages+0x101/0x130
[<ffffffffbb71861b>] __alloc_pages_direct_compact+0xac/0x1d0
[<ffffffffbb15760b>] __alloc_pages_nodemask+0x6ab/0xaf0
[<ffffffffbb19c9ea>] alloc_pages_vma+0x9a/0x160
[<ffffffffbb1aef0d>] do_huge_pmd_anonymous_page+0xfd/0x3c0
[<ffffffffbb0a19cd>] ? get_parent_ip+0xd/0x50
[<ffffffffbb17ac18>] handle_mm_fault+0x158/0xcb0
[<ffffffffbb72594d>] ? retint_restore_args+0xe/0xe
[<ffffffffbb728bb6>] __do_page_fault+0x1a6/0x620
[<ffffffffbb11011e>] ? __acct_update_integrals+0x8e/0x120
[<ffffffffbb0a19cd>] ? get_parent_ip+0xd/0x50
[<ffffffffbb72949b>] ? preempt_count_sub+0x6b/0xf0
[<ffffffffbb72904e>] do_page_fault+0x1e/0x70
Code: 75 1d 55 be 6c 00 00 00 48 c7 c7 8a 2f a2 bb 48 89 e5 e8 6c 49 95 ff 5d c6 05 74 16 65 00 01 c3 55 31 f6 48 89 e5 e8 28 bd a3 ff <0f> 0b 0f 1f 44 00 00 55 48 89 e5 41 57 45 31 ff 41 56 49 89 fe
RIP [<ffffffffbb718d98>]

That BUG is..

413 static inline int PageTransHuge(struct page *page)
414 {
415 VM_BUG_ON_PAGE(PageTail(page), page);
416 return PageHead(page);
417 }

Any idea which of the two PageTransHuge() calls in isolate_migratepages_range() that is? Offset far in the function suggest it's where the lru lock is already held, but I'm not sure as decodecode of your dump and objdump of my own compile look widely different.

If it's indeed the later PageTransHuge() call, it means that somebody else has cleared PageLRU and set PageTail (I don't think a page could have both at once) between the checks for PageLRU() and PageTransHuge() in isolate_migratepages_range(), while the latter was holding lru_lock. That's quite weird...

Vlastimil

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/