Re: + bootmem-node-setup-agnostic-free_bootmem.patch added to -mm tree

From: Yinghai Lu
Date: Tue Apr 15 2008 - 16:04:27 EST


On Tue, Apr 15, 2008 at 12:55 PM, Johannes Weiner <hannes@xxxxxxxxxxxx> wrote:
> Hi,
>
>
>
> "Yinghai Lu" <yhlu.kernel@xxxxxxxxx> writes:
>
> > On Tue, Apr 15, 2008 at 5:51 AM, Johannes Weiner <hannes@xxxxxxxxxxxx> wrote:
> >> Hi Ingo,
> >>
> >>
> >>
> >> Ingo Molnar <mingo@xxxxxxx> writes:
> >>
> >> > * akpm@xxxxxxxxxxxxxxxxxxxx <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> >> >
> >> >> Subject: bootmem: node-setup agnostic free_bootmem()
> >> >> From: Johannes Weiner <hannes@xxxxxxxxxxxx>
> >> >>
> >> >> Make free_bootmem() look up the node holding the specified address
> >> >> range which lets it work transparently on single-node and multi-node
> >> >> configurations.
> >> >
> >> > this patch does not fix the bug Yinghai's (now dropped) patches solved:
> >> > reserve_early() allocations. So NAK until the full problem has been
> >> > sorted out ...
> >>
> >> Okay, NAK on -mm and -x86 for sure. The patch was meant for mainline
> >> where there is no need for free_bootmem() going across nodes, right?
> >>
> >> But I still object to the way Yinghai implemented it.
> >> free_bootmem_core() should not be twisted like this.
> >>
> >> How about the following (untested, even uncompiled, but you should get
> >> the idea) proposal which would replace the patch discussed in this
> >> thread:
> >>
> >> --- tree-linus.orig/mm/bootmem.c
> >> +++ tree-linus/mm/bootmem.c
> >> @@ -421,7 +421,25 @@ int __init reserve_bootmem(unsigned long
> >>
> >>
> >> void __init free_bootmem(unsigned long addr, unsigned long size)
> >> {
> >> - free_bootmem_core(NODE_DATA(0)->bdata, addr, size);
> >> + bootmem_data_t *bdata;
> >> +
> >> + list_for_each_entry(bdata, &bdata_list, list) {
> >> + unsigned long remainder = 0;
> >>
> >> +
> >> + if (addr < bdata->node_boot_start)
> >> + continue;
> >> +
> >> + if (PFN_DOWN(addr + size) > bdata->node_low_pfn)
> >> + remainder = PFN_DOWN(addr + size) - bdata->node_low_pfn;
> >> +
> >> + size -= PFN_PHYS(remainder);
> >>
> >> + free_bootmem_core(bdata, addr, size)
> >> +
> >> + if (!remainder)
> >> + break;
> >> +
> >> + addr = PFN_PHYS(bdata->node_low_pfn + 1);
> >> + }
> >>
> >> }
> >>
> >> unsigned long __init free_all_bootmem(void)
> >
> > how about
> > 1. bdata is not sorted?
>
> They are kept in a sorted list. How could they be unsorted?
>
>
> > 2. intel cross node box: node0: 0g-2g, 4g-6g, node1: 2g-4g, 6g-8g. i
> > don't think they have two bdata struct for every node.
>
> How do the bdata structures represent this setup right now? Are you
> sure that there is not a node descriptor for every contiguous region?

http://lkml.org/lkml/2008/3/25/233

Subject [patch] srat, x86_64: Add support for nodes spanning other nodes

For example, If the physical address layout on a two node system with 8 GB
memory is something like:
node 0: 0-2GB, 4-6GB
node 1: 2-4GB, 6-8GB

Current kernels fail to boot/detect this NUMA topology.

ACPI SRAT tables can expose such a topology which needs to be supported.

Signed-off-by: Suresh Siddha <suresh.b.siddha@xxxxxxxxx>

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/