Re: [patch] mm: sparsemem memory_present() memory corruption fix

From: Mel Gorman
Date: Wed Apr 16 2008 - 10:05:30 EST


On (16/04/08 02:03), Ingo Molnar didst pronounce:
>
> finally found it ... the patch below solves the sparsemem crash and the
> testsystem boots up fine now:
>
> mars:~> uname -a
> Linux mars 2.6.25-rc9-sched-devel.git-x86-latest.git #985 SMP Wed Apr 16
> 01:37:37 CEST 2008 i686 i686 i386 GNU/Linux
>
> yay! :-)
>

Very cool :) This fixed the silent lock-up that I was getting when using
your config as well.

At a bit of a loss yesterday to explain what was going wrong, I had started
putting together patches to sanity check memory initialisation at various
different stages trying to catch where things were going pear-shaped. You
found the bug before it was done but I finished the basics anyway and posted
it as "[RFC] Verification and debugging of memory initialisation". Something
like it may help avoid similar headaches for people who tend to run into
(or cause) boot problems.

> ps. anyone who can correctly guess the method with which i found the
> exact place that corrupted memory will get a free beer next time we
> meet :-)
>
> ------------------------->
> Subject: mm: sparsemem memory_present() memory corruption fix
> From: Ingo Molnar <mingo@xxxxxxx>
> Date: Wed Apr 16 01:40:00 CEST 2008
>
> fix memory corruption and crash on 32-bit x86 systems.
>
> if a !PAE x86 kernel is booted on a 32-bit system with more than
> 4GB of RAM, then we call memory_present() with a start/end that
> goes outside the scope of MAX_PHYSMEM_BITS.
>
> that causes this loop to happily walk over the limit of the
> sparse memory section map:
>
> for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) {
> unsigned long section = pfn_to_section_nr(pfn);
> struct mem_section *ms;
>
> sparse_index_init(section, nid);
> set_section_nid(section, nid);
>
> ms = __nr_to_section(section);
> if (!ms->section_mem_map)
> ms->section_mem_map = sparse_encode_early_nid(nid) |
>
> 'ms' will be out of bounds and we'll corrupt a small amount of memory by
> encoding the node ID. Depending on what that memory is, we might crash,
> misbehave or just not notice the bug.
>
> the fix is to sanity check anything the architecture passes to sparsemem.
>
> this bug seems to be rather old (as old as sparsemem support itself),
> but the exact incarnation depended on random details like configs,
> which made this bug more prominent in v2.6.25-to-be.
>
> an additional enhancement might be to print a warning about ignored
> or trimmed memory ranges.
>
> Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
> ---
> mm/sparse.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> Index: linux/mm/sparse.c
> ===================================================================
> --- linux.orig/mm/sparse.c
> +++ linux/mm/sparse.c
> @@ -149,8 +149,18 @@ static inline int sparse_early_nid(struc
> /* Record a memory area against a node. */
> void __init memory_present(int nid, unsigned long start, unsigned long end)
> {
> + unsigned long max_arch_pfn = 1ULL << (MAX_PHYSMEM_BITS-PAGE_SHIFT);
> unsigned long pfn;
>
> + /*
> + * Sanity checks - do not allow an architecture to pass
> + * in larger pfns than the maximum scope of sparsemem:
> + */
> + if (start >= max_arch_pfn)
> + return;
> + if (end >= max_arch_pfn)
> + end = max_arch_pfn;
> +
> start &= PAGE_SECTION_MASK;
> for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) {
> unsigned long section = pfn_to_section_nr(pfn);
>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/