[PATCH] drivers: memory: check for missing sections when testing zones

From: Andrew Banman
Date: Thu Dec 03 2015 - 12:58:34 EST


test_pages_in_a_zone does not account for the possibility of missing sections
in the given pfn range. Since pfn_valid_within always returns 1 when
CONFIG_HOLES_IN_ZONE is not set, invalid pfns from missing sections
will pass the test, resulting in a kernel oops. This is remedied by simply
checking for the presence of the pfn's section. We don't have to remove
the pfn_valid_within optimization.

The patch also prevents a crash from offlining memory devices with missing
sections. Despite this, it's probably best to keep

[PATCH 3/3] drivers: memory: prohibit offlining of memory blocks withmissing sections

because missing sections may indicate other problems, like overlapping mem
blocks and who knows what else (see the discussion at BZ 107781).

---
mm/memory_hotplug.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 67d488a..74f5bcd 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1383,6 +1383,9 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn)
pfn < end_pfn;
pfn += MAX_ORDER_NR_PAGES) {
i = 0;
+ /* Make sure the memory section is present */
+ if (!present_section_nr(pfn_to_section_nr(pfn)))
+ continue;
/* This is just a CONFIG_HOLES_IN_ZONE check.*/
while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
i++;
--
1.7.12.4


On 12/02/2015 04:45 PM, Andrew Morton wrote:
> On Wed, 2 Dec 2015 09:07:01 -0600 Seth Jennings <sjennings@xxxxxxxxxxxxxx> wrote:
>
>> bdee237c and 982792c7 introduced large block sizes for x86.
>> This made it possible to have multiple sections per memory
>> block where previously, there was a only every one section
>> per block.
>>
>> Since blocks consist of contiguous ranges of section, there
>> can be holes in the blocks where sections are not present.
>> If one attempts to offline such a block, a crash occurs since
>> the code is not designed to deal with this.
>>
>> This patch is a quick fix to gaurd against the crash by
>> not allowing blocks with non-present sections to be offlined.
>>
>> ...
>>
>> --- a/drivers/base/memory.c
>> +++ b/drivers/base/memory.c
>> @@ -303,6 +303,10 @@ static int memory_subsys_offline(struct device *dev)
>> if (mem->state == MEM_OFFLINE)
>> return 0;
>>
>> + /* Can't offline block with non-present sections */
>> + if (mem->section_count != sections_per_block)
>> + return -EINVAL;
>> +
>> return memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE);
>> }
>
> [3/3] fixes a kernel crash so I've tagged it for -stable and shall move
> it ahead of [1/2] and [2/2], which are merely cleanups.
>
> This assumes that [3/3] is independent of the other two patches. I'll
> eat my hat if it isn't.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/