[PATCH] mm/page_ext: reserve more space in case of unaligned node range

From: Joonsoo Kim
Date: Tue Dec 02 2014 - 19:48:16 EST


When page allocator's buddy algorithm checks buddy's status,
checked page could be in invalid range. In this case, lookup_page_ext()
will return invalid address and it results in invalid address
defereference problem. For example, if node_start_pfn is 1 and page
with pfn 1 is freed to page allocator, page_is_buddy() would check
the page with pfn 0. In page_ext code, offset would be calculated
by pfn - node_start_pfn, so, 0 - 1 = -1. This causes following problem
reported by Fengguang.

[ 0.480155] BUG: unable to handle kernel
[ 0.480155] BUG: unable to handle kernel paging requestpaging request at d26bdffc
at d26bdffc
[ 0.481566] IP:
[ 0.481566] IP: [<c110bc7a>] free_one_page+0x31a/0x3e0
[<c110bc7a>] free_one_page+0x31a/0x3e0
[ 0.482801] *pdpt = 0000000001866001
[ 0.482801] *pdpt = 0000000001866001 *pde = 0000000012584067 *pde = 0000000012584067 *pte = 80000000126bd060 *pte = 80000000126bd060
[ 0.483333] Oops: 0000 [#1]
[ 0.483333] Oops: 0000 [#1] SMP SMP DEBUG_PAGEALLOCDEBUG_PAGEALLOC
snip...
[ 0.483333] Call Trace:
[ 0.483333] Call Trace:
[ 0.483333] [<c110bdec>] __free_pages_ok+0xac/0xf0
[ 0.483333] [<c110bdec>] __free_pages_ok+0xac/0xf0
[ 0.483333] [<c110c769>] __free_pages+0x19/0x30
[ 0.483333] [<c110c769>] __free_pages+0x19/0x30
[ 0.483333] [<c1144ca5>] kfree+0x75/0xf0
[ 0.483333] [<c1144ca5>] kfree+0x75/0xf0
[ 0.483333] [<c111b595>] ? kvfree+0x45/0x50
[ 0.483333] [<c111b595>] ? kvfree+0x45/0x50
[ 0.483333] [<c111b595>] kvfree+0x45/0x50
[ 0.483333] [<c111b595>] kvfree+0x45/0x50
[ 0.483333] [<c134bb73>] rhashtable_expand+0x1b3/0x1e0
[ 0.483333] [<c134bb73>] rhashtable_expand+0x1b3/0x1e0
[ 0.483333] [<c17fc9f9>] test_rht_init+0x173/0x2e8
[ 0.483333] [<c17fc9f9>] test_rht_init+0x173/0x2e8
[ 0.483333] [<c134b750>] ? jhash2+0xe0/0xe0
[ 0.483333] [<c134b750>] ? jhash2+0xe0/0xe0
[ 0.483333] [<c134b790>] ? rhashtable_hashfn+0x20/0x20
[ 0.483333] [<c134b790>] ? rhashtable_hashfn+0x20/0x20
[ 0.483333] [<c134b7b0>] ? rht_grow_above_75+0x20/0x20
[ 0.483333] [<c134b7b0>] ? rht_grow_above_75+0x20/0x20
[ 0.483333] [<c134b7d0>] ? rht_shrink_below_30+0x20/0x20
[ 0.483333] [<c134b7d0>] ? rht_shrink_below_30+0x20/0x20
[ 0.483333] [<c134b750>] ? jhash2+0xe0/0xe0
[ 0.483333] [<c134b750>] ? jhash2+0xe0/0xe0
[ 0.483333] [<c134b790>] ? rhashtable_hashfn+0x20/0x20
[ 0.483333] [<c134b790>] ? rhashtable_hashfn+0x20/0x20
[ 0.483333] [<c134b7b0>] ? rht_grow_above_75+0x20/0x20
[ 0.483333] [<c134b7b0>] ? rht_grow_above_75+0x20/0x20
[ 0.483333] [<c134b7d0>] ? rht_shrink_below_30+0x20/0x20
[ 0.483333] [<c134b7d0>] ? rht_shrink_below_30+0x20/0x20
[ 0.483333] [<c17fc886>] ? test_rht_lookup+0x8f/0x8f
[ 0.483333] [<c17fc886>] ? test_rht_lookup+0x8f/0x8f
[ 0.483333] [<c1000486>] do_one_initcall+0xc6/0x210
[ 0.483333] [<c1000486>] do_one_initcall+0xc6/0x210
[ 0.483333] [<c17fc886>] ? test_rht_lookup+0x8f/0x8f
[ 0.483333] [<c17fc886>] ? test_rht_lookup+0x8f/0x8f
[ 0.483333] [<c17d0505>] ? repair_env_string+0x12/0x54
[ 0.483333] [<c17d0505>] ? repair_env_string+0x12/0x54
[ 0.483333] [<c17d0cf3>] kernel_init_freeable+0x193/0x213
[ 0.483333] [<c17d0cf3>] kernel_init_freeable+0x193/0x213
[ 0.483333] [<c1512500>] kernel_init+0x10/0xf0
[ 0.483333] [<c1512500>] kernel_init+0x10/0xf0
[ 0.483333] [<c151c5c1>] ret_from_kernel_thread+0x21/0x30
[ 0.483333] [<c151c5c1>] ret_from_kernel_thread+0x21/0x30
[ 0.483333] [<c15124f0>] ? rest_init+0xb0/0xb0
[ 0.483333] [<c15124f0>] ? rest_init+0xb0/0xb0
snip...
[ 0.483333] EIP: [<c110bc7a>]
[ 0.483333] EIP: [<c110bc7a>] free_one_page+0x31a/0x3e0free_one_page+0x31a/0x3e0 SS:ESP 0068:c0041de0
SS:ESP 0068:c0041de0
[ 0.483333] CR2: 00000000d26bdffc
[ 0.483333] CR2: 00000000d26bdffc
[ 0.483333] ---[ end trace 7648e12f817ef2ad ]---
[ 0.483333] ---[ end trace 7648e12f817ef2ad ]---

This case is already handled in case of struct page by considering
alignment of node_start_pfn. So, this patch follows that way to fix this
situation.

Reported-by: Fengguang Wu <fengguang.wu@xxxxxxxxx>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
---
mm/page_ext.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/mm/page_ext.c b/mm/page_ext.c
index ce86485..184f3ef 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -112,7 +112,8 @@ struct page_ext *lookup_page_ext(struct page *page)
if (unlikely(!base))
return NULL;
#endif
- offset = pfn - NODE_DATA(page_to_nid(page))->node_start_pfn;
+ offset = pfn - round_down(node_start_pfn(page_to_nid(page)),
+ MAX_ORDER_NR_PAGES);
return base + offset;
}

@@ -126,6 +127,15 @@ static int __init alloc_node_page_ext(int nid)
if (!nr_pages)
return 0;

+ /*
+ * Need extra space if node range is not aligned with
+ * MAX_ORDER_NR_PAGES. When page allocator's buddy algorithm
+ * checks buddy's status, range could be out of exact node range.
+ */
+ if (!IS_ALIGNED(node_start_pfn(nid), MAX_ORDER_NR_PAGES) ||
+ !IS_ALIGNED(node_end_pfn(nid), MAX_ORDER_NR_PAGES))
+ nr_pages += MAX_ORDER_NR_PAGES;
+
table_size = sizeof(struct page_ext) * nr_pages;

base = memblock_virt_alloc_try_nid_nopanic(
--
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/