Re: [patch 2/2] bootmem: Node-setup agnostic free_bootmem()

From: Yinghai Lu
Date: Tue Apr 15 2008 - 14:52:24 EST


On Tue, Apr 15, 2008 at 4:51 AM, Johannes Weiner <hannes@xxxxxxxxxxxx> wrote:
> Hi,
>
>
>
> Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> writes:
>
> > On Tue, 15 Apr 2008 00:28:34 -0700 "Yinghai Lu" <yhlu.kernel@xxxxxxxxx> wrote:
> >
> >> On Tue, Apr 15, 2008 at 12:15 AM, Andrew Morton
> >> <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> >> >
> >> > On Tue, 15 Apr 2008 00:04:03 -0700 "Yinghai Lu" <yhlu.kernel@xxxxxxxxx> wrote:
> >> >
> >> > > On Mon, Apr 14, 2008 at 11:23 PM, Andrew Morton
> >> > > <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> >> > > >
> >> > > > On Sun, 13 Apr 2008 18:56:57 +0200 Andi Kleen <andi@xxxxxxxxxxxxxx> wrote:
> >> > > >
> >> > > > > Johannes Weiner <hannes@xxxxxxxxxxxx> writes:
> >> > > > >
> >> > > > > > Make free_bootmem() look up the node holding the specified address
> >> > > > > > range which lets it work transparently on single-node and multi-node
> >> > > > > > configurations.
> >> > > > >
> >> > > > > Acked-by: Andi Kleen <andi@xxxxxxxxxxxxxx>
> >> > > > >
> >> > > > > This is far better than the original change it replaces and which
> >> > > > > I also objected to in review.
> >> > > > >
> >> > > >
> >> > > > So... do we think these two patches are sufficiently safe and important for
> >> > > > 2.6.25?
> >> > >
> >> > > the patch is wrong
> >> > >
> >> >
> >> > The last I saw was this:
> >> >
> >> >
> >> > On Sun, 13 Apr 2008 12:57:22 +0200 Johannes Weiner <hannes@xxxxxxxxxxxx> wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > "Yinghai Lu" <yhlu.kernel@xxxxxxxxx> writes:
> >> > >
> >> > > > On Sat, Apr 12, 2008 at 3:33 PM, Johannes Weiner <hannes@xxxxxxxxxxxx> wrote:
> >> > > > ...
> >> >
> >> > > >
> >> > > > could have chance that bootmem with reserved_early that is crossing
> >> > > > the nodes.
> >> > >
> >> > > Upstream reserve_bootmem_core() would BUG() on a caller trying to cross
> >> > > nodes, so I don't see where this chance could come from.
> >> >
> >> > Is that what you're referring to?
> >> >
> >> > Was Johannes observation incorrect? If so, why?
> >>
> >> my patch with free_bootmem will make sure free_bootmem_core only free
> >> bootmem in the bdata scope.
> >> so free_bootmem can handle the cross_node bootmem that is done by
> >> reserve_early ( done in another patch, is dropped by you because took
> >> Jonannes).
> >>
> >> in setup_arch for x86_64 there is one free_bootmem that is used when
> >> ramdisk is falled out of ram map. that could be crossed by bootloader
> >> and kexec, and kernel or second kernel is memmap=NN@SS to execlue some
> >> memory.
> >>
> >> anyway that is extrem case, but my patch could handle that.
>
> Has this case ever occured? If this could become real, I have no
> objections to implement a way to handle it (why would I?), but until now
> you just said that in some time in the future, this could be useful.
>
>
> >>
> >> I wonder if any regression caused by my previous patch? maybe on other platform?
> >>
> >
> > Not that I'm aware of.
>
> It papers over buggy usage of free_bootmem(). If its arguments are
> bogus, it will just return again where it BUG()ed out before. The pages
> might be never marked free and therefor never reach the buddy allocator.
>
>
> > I restored mm-make-reserve_bootmem-can-crossed-the-nodes.patch. Johannes,
> > can you please check 2.6.28-rc8-mm2, see if it looks OK?
>
> I object to the way it is implemented. If it is really needed, that
> should be done properly:
>
> - remove the double loop over the area on the likely succeeding
> path and unroll the reserving on the unlikely path as it was
> done before. Better to punish exceptional branches than
> the working paths.
> - make reserve_bootmem_core be strict with its arguments. If
> you want to iterate over the bdata list, you should not just
> throw every item at the _core functions and let them work it
> out for themselves. The correct parameters should be
> calculated in advance and then passed to a strict
> _bootmem_core() function that BUG()s on failure.
>
> But still, Yinghai, you never brought in practical reasons for this
> whole thing. You talked about extreme and theoretical cases and I don't
> think that this justifies breaking API or pessimizing code at all.

free_bootmem(ramdisk_image, ramdisk_size) is sitting in setup_arch of
x86_64. or make that panic directly.

what i needed is: free_bootmem can free bootmem cross the nodes.

on numa
alloc_bootmem always return blocks on same nodes. but some via
reserve_early and then to bootmem via early_res_to_bootmem could be
crossing nodes.

BTW, can you look at patches in -mm about make reserve_bootmem cross the nodes?

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/