Re: [PATCH] memory_hotplug: fix the panic when memory end is not on the section boundary

From: Zaslonko Mikhail
Date: Tue Sep 11 2018 - 10:06:36 EST




On 10.09.2018 15:17, Michal Hocko wrote:
[Cc Pavel]

On Mon 10-09-18 14:35:27, Mikhail Zaslonko wrote:
If memory end is not aligned with the linux memory section boundary, such
a section is only partly initialized. This may lead to VM_BUG_ON due to
uninitialized struct pages access from is_mem_section_removable() or
test_pages_in_a_zone() function.

Here is one of the panic examples:
CONFIG_DEBUG_VM_PGFLAGS=y
kernel parameter mem=3075M
OK, so the last memory section is not full and we have a partial memory
block right?

Right. In my example above, I define 3075M (3Gig + 3Meg) of base memory in the
kernel parameters. As a result we end up with the last memory block having only
3 megabytes initialized. The initialization takes place within
memmap_init_zone(unsigned long size, ...) function called from
free_area_init_core() with the size = zone->spanned_pages. Thus, only three
megabytes of the last memory block are initialized (till the end of the zone
Normal). And with the page poisoning introduced by Pavel we fail on such a
memory block processing in memory_hotplug code (no actual memory hotplug
is involved here).


page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
OK, this means that the struct page is not fully initialized. Do you
have a specific place which has triggered this assert?

This assert is triggered in page_to_nid() function when it is called for
uninitialized page. I found two places where that can happen:
1) is_pageblock_removable_nolock() - direct call
2) test_pages_in_a_zone() - via page_zone() call


------------[ cut here ]------------
Call Trace:
([<000000000039b8a4>] is_mem_section_removable+0xcc/0x1c0)
[<00000000009558ba>] show_mem_removable+0xda/0xe0
[<00000000009325fc>] dev_attr_show+0x3c/0x80
[<000000000047e7ea>] sysfs_kf_seq_show+0xda/0x160
[<00000000003fc4e0>] seq_read+0x208/0x4c8
[<00000000003cb80e>] __vfs_read+0x46/0x180
[<00000000003cb9ce>] vfs_read+0x86/0x148
[<00000000003cc06a>] ksys_read+0x62/0xc0
[<0000000000c001c0>] system_call+0xdc/0x2d8

This fix checks if the page lies within the zone boundaries before
accessing the struct page data. The check is added to both functions.
Actually similar check has already been present in
is_pageblock_removable_nolock() function but only after the struct page
is accessed.

Well, I am afraid this is not the proper solution. We are relying on the
full pageblock worth of initialized struct pages at many other place. We
used to do that in the past because we have initialized the full
section but this has been changed recently. Pavel, do you have any ideas
how to deal with this partial mem sections now?

I think this is not related to the recent changes of memory initialization. If
you mean deferred init case, the problem exists even without
CONFIG_DEFERRED_STRUCT_PAGE_INIT kernel option.

Signed-off-by: Mikhail Zaslonko <zaslonko@xxxxxxxxxxxxx>
Reviewed-by: Gerald Schaefer <gerald.schaefer@xxxxxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx>
---
mm/memory_hotplug.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9eea6e809a4e..8e20e8fcc3b0 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1229,9 +1229,8 @@ static struct page *next_active_pageblock(struct page *page)
return page + pageblock_nr_pages;
}
-static bool is_pageblock_removable_nolock(struct page *page)
+static bool is_pageblock_removable_nolock(struct page *page, struct zone **zone)
{
- struct zone *zone;
unsigned long pfn;
/*
@@ -1241,15 +1240,14 @@ static bool is_pageblock_removable_nolock(struct page *page)
* We have to take care about the node as well. If the node is offline
* its NODE_DATA will be NULL - see page_zone.
*/
- if (!node_online(page_to_nid(page)))
- return false;
-
- zone = page_zone(page);
pfn = page_to_pfn(page);
- if (!zone_spans_pfn(zone, pfn))
+ if (*zone && !zone_spans_pfn(*zone, pfn))
return false;
+ if (!node_online(page_to_nid(page)))
+ return false;
+ *zone = page_zone(page);
- return !has_unmovable_pages(zone, page, 0, MIGRATE_MOVABLE, true);
+ return !has_unmovable_pages(*zone, page, 0, MIGRATE_MOVABLE, true);
}
/* Checks if this range of memory is likely to be hot-removable. */
@@ -1257,10 +1255,11 @@ bool is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
{
struct page *page = pfn_to_page(start_pfn);
struct page *end_page = page + nr_pages;
+ struct zone *zone = NULL;
/* Check the starting page of each pageblock within the range */
for (; page < end_page; page = next_active_pageblock(page)) {
- if (!is_pageblock_removable_nolock(page))
+ if (!is_pageblock_removable_nolock(page, &zone))
return false;
cond_resched();
}
@@ -1296,6 +1295,9 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn,
i++;
if (i == MAX_ORDER_NR_PAGES || pfn + i >= end_pfn)
continue;
+ /* Check if we got outside of the zone */
+ if (zone && !zone_spans_pfn(zone, pfn))
+ return 0;
page = pfn_to_page(pfn + i);
if (zone && page_zone(page) != zone)
return 0;
--
2.16.4