Re: [discuss] mmap, mbind and write to mmap'ed memory crashes 2.6.16-rc1[2] on 2 node X86_64

From: Bharata B Rao
Date: Wed Feb 08 2006 - 23:34:01 EST


On Wed, Feb 08, 2006 at 05:06:26PM +0100, Andi Kleen wrote:
> On Wednesday 08 February 2006 16:59, Christoph Lameter wrote:
> > On Wed, 8 Feb 2006, Andi Kleen wrote:
> >
> > > On Wednesday 08 February 2006 16:42, Christoph Lameter wrote:
> > >
> > > > However, this has implications for policy_zone. This variable should store
> > > > the zone that policies apply to. However, in your case this zone will vary
> > > > which may lead to all sorts of weird behavior even if we fix
> > > > bind_zonelist. To which zone does policy apply? ZONE_NORMAL or ZONE_DMA32?
> > >
> > > It really needs to apply to both (currently you can't police 4GB of your
> > > memory on x86-64) But I haven't worked out a good design how to implement it yet.
> >
> > So a provisional solution would be to simply ignore empty zones in
> > bind_zonelist?
>
> That would likely prevent the crash yes (Bharata can you test?)

With this solution, the kernel doesn't crash, but the application does.

Shouldn't we fail mbind if we can't bind any zones ?
Something like this...


Signed-off-by: Bharata B Rao <bharata@xxxxxxxxxx>

--- linux-2.6.16-rc2/mm/mempolicy.c.orig 2006-02-09 01:34:37.000000000 -0800
+++ linux-2.6.16-rc2/mm/mempolicy.c 2006-02-09 01:39:32.000000000 -0800
@@ -143,8 +143,18 @@
if (!zl)
return NULL;
num = 0;
- for_each_node_mask(nd, *nodes)
- zl->zones[num++] = &NODE_DATA(nd)->node_zones[policy_zone];
+ for_each_node_mask(nd, *nodes) {
+ struct zone *zone = &NODE_DATA(nd)->node_zones[policy_zone];
+
+ if (zone->present_pages)
+ zl->zones[num++] = zone;
+ }
+
+ if (!num) {
+ /* failed to bind even a single zone */
+ kfree(zl);
+ return NULL;
+ }
zl->zones[num] = NULL;
return zl;
}

>
> But of course it still has the problem of a lot of memory being unpolicied
> on machines with >4GB if there's both DMA32 and NORMAL.
>
> > Or fall back to earlier zones (which includes unpolicied
> > zones in the bind zone list?)
>

Does it make sense to have a separate policy_zone for each node so that we
have atleast one(highest) zone in a node which comes under memory policy ?

Regards,
Bharata.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/