Re: [BUG] set_mempolicy(MPOL_INTERLEAV) cause kernel panic

From: KAMEZAWA Hiroyuki
Date: Thu Jul 16 2009 - 22:41:11 EST


On Fri, 17 Jul 2009 11:07:09 +0900 (JST)
KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx> wrote:

> > On Fri, 17 Jul 2009 09:04:46 +0900 (JST)
> > KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx> wrote:
> >
> > > > On Wed, 15 Jul 2009, Lee Schermerhorn wrote:
> > > >
> > > > > Interestingly, on ia64, the top cpuset mems_allowed gets set to all
> > > > > possible nodes, while on x86_64, it gets set to on-line nodes [or nodes
> > > > > with memory]. Maybe this is a to support hot-plug?
> > > > >
> > > >
> > > > numactl --interleave=all simply passes a nodemask with all bits set, so if
> > > > cpuset_current_mems_allowed includes offline nodes from node_possible_map,
> > > > then mpol_set_nodemask() doesn't mask them off.
> > > >
> > > > Seems like we could handle this strictly in mempolicies without worrying
> > > > about top_cpuset like in the following?
> > >
> > > This patch seems band-aid patch. it will change memory-hotplug behavior.
> > > Please imazine following scenario:
> > >
> > > 1. numactl interleave=all process-A
> > > 2. memory hot-add
> > >
> > > before 2.6.30:
> > > -> process-A can use hot-added memory
> > >
> > > your proposal patch:
> > > -> process-A can't use hot-added memory
> > >
> >
> > IMHO, the application itseld should be notifed to change its mempolicy by
> > hot-plug script on the host. While an application uses interleave, a new node
> > hot-added is just a noise. I think "How pages are interleaved" should not be
> > changed implicitly. Then, checking at set_mempolicy() seems sane. If notified,
> > application can do page migration and rebuild his mapping in ideal way.
>
> Do you really want ABI change?
>
No ;_

Hmm, IIUC, current handling of nodemask of mempolicy is below.
There should be 3 masks.
- systems's N_HIGH_MEMORY
- the mask user specified via mempolicy() (remembered only when MPOL_F_RELATIVE
- cpusets's one

And pol->v.nodes is just a _cache_ of logical-and of aboves.
Synchronization with cpusets is guaranteed by cpuset's generation.
Synchronization with N_HIGH_MEMORY should be guaranteed by memory hotplug
notifier, but this is not implemented yet.

Then, what I can tell here is...
- remember what's user requested. (only when MPOL_F_RELATIVE_NODES ?)
- add notifiers for memory hot-add. (only when MPOL_F_RELATIVE_NODES ?)
- add notifiers for memory hot-remove (both MPOL_F_STATIC/RELATIVE_NODES ?)

IMHO, for cpusets, don't calculate v.nodes again if MPOL_F_STATIC is good.
But for N_HIGH_MEMORY, v.nodes should be caluculated even if MPOL_F_STATIC is set.

Then, I think the mask user passed should be remembered even if MPOL_F_STATIC is
set and v.nodes should work as cache and should be updated in appropriate way.

Thanks,
-Kame












--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/