Re: [PATCH] cpuset: fix allocating page cache/slab object on the unallowed node when memory spread is set

From: Nick Piggin
Date: Wed Feb 11 2009 - 19:54:53 EST


On Tuesday 10 February 2009 22:37:28 Paul Menage wrote:
> On Sun, Feb 8, 2009 at 8:02 PM, Nick Piggin <nickpiggin@xxxxxxxxxxxx> wrote:
> > Is it a problem if mems_allowed can get sampled in an unsafe way?
>
> It could cause an OOM to be incorrectly triggered if the task didn't
> see all the mems that it was meant to have access to. (In the extreme
> case, it could see no mems at all).

You can't be overly worried about concurrency cases. We're talking about
two threads here, one chancing mems_allowed, and the other performing
an allocation.

It would be possible, depending on timing, for the allocating thread to
see either pre or post mems_allowed even if access was fully locked.

The only difference is that a partially changed mems_allowed could be
seen. But what does this really mean? Some combination of the new and
the old nodes. I don't think this is too much of a problem.


> > It will happen only quite rarely. This code seems to be far simpler
> > and more robust than the current fragile scheme, so it would be
> > nice to use it.
>
> Agreed, the existing task->mems_allowed / cpuset_update_memory_state()
> code is a bit unwieldy. It's pretty much all inherited from the
> original cpusets code.
>
> A nicer way to do it might be:
>
> - get rid of task->mems_allowed entirely
>
> - have cpuset->mems_allowed be a pointer to an immutable RCU-protected
> nodemask (so updating the nodemask for a cpuset would always replace
> it with a fresh one and RCU-free the old one)
>
> - make cpuset_zone_allowed_*() use rcu_read_lock() and just check
> *(task_cs(current)->mems_allowed) rather than current->mems_allowed
>
> - ensure that get_page_from_freelist() is also RCU-safe (right now I
> think it's not since cpuset_zone_allowed_softwall() can sleep, but I
> think cgroups/cpusets is sufficiently RCU-safe now that we could quite
> likely remove the mutex lock from cpuset_zone_allowed_*()
>
> - add an rcu_read_lock() in hugetlb.c:cpuset_mems_nr()

This could work if we *really* need an atomic snapshot of mems_allowed.
seqcount synchronisation would be an alternative too that could allow
sleeping more easily than SRCU (OTOH if you don't need sleeping, then
RCU should be faster than seqcount).

But I'm not convinced we do need this to be atomic.


> - figure out where the mempolicy updates currently done in
> cpuset_update_task_memory_state() need to occur - this is part of the
> code that I'm pretty fuzzy on. (Maybe we can just copy the bits from
> Miao's patch for this?)

Basically we want to push all that into the sites where the memory policy
is actually changed. So yes they should all go away.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/