Re: [-mm] Add an owner to the mm_struct (v2)

From: Balbir Singh
Date: Fri Mar 28 2008 - 10:56:21 EST


Paul Menage wrote:
> On Fri, Mar 28, 2008 at 5:36 AM, Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx> wrote:
>> > - you hold task_lock() for the new owner (which is necessary anyway to
>> > ensure that the new owner's mm doesn't change while you're updating
>> > mm->owner)
>> >
>>
>> tsk->mm should not change unless the task is exiting or when a kernel thread
>> does use_mm() (PF_BORROWED_MM).
>>
>
> Or the task calls execve().
>
>> I see mm->owner changing when
>>
>> 1. The mm->owner exits
>> 2. At fork time for clone calls with CLONE_VM
>

This was supposed to be deleted, ignore it

> Why would a clone() call with CLONE_VM change mm->owner? We're sharing
> with the existing owner, who is still using the mm, so we should leave
> it as it is.
>
> I don't see what invariant you're trying to protect with
> mm->owner_lock. Can you explain it?
>

mm->owner_lock is there to protect mm->owner field from changing simultaneously
as tasks fork/exit.

>> >> @@ -1357,6 +1379,10 @@ static struct task_struct *copy_process(
>> >> write_unlock_irq(&tasklist_lock);
>> >> proc_fork_connector(p);
>> >> cgroup_post_fork(p);
>> >> +
>> >> + if (!(clone_flags & CLONE_VM) && (p != p->group_leader))
>> >> + mm_fork_init_owner(p);
>> >> +
>> >
>> > I'm not sure I understand what this is doing.
>> >
>> > I read it as "if p has its own mm and p is a child thread, set
>> > p->mm->owner to p->group_leader". But by definition if p has its own
>> > mm, then p->group_leader->mm will be different to p->mm, therefore
>> > we'd end up with mm->owner->mm != mm, which seems very bad.
>> >
>> > What's the intention of this bit of code?
>> >
>>
>> The intention is to handle the case when clone is called without CLONE_VM and
>> with CLONE_THREAD. This means that p can have it's own mm and a shared group_leader.
>>
>
> In which case p should be the owner of its new mm, not
> p->group_leader. mm_fork_init_owner() does:
>

Oh! yes.. my bad again. The check should have been p == p->thread_group, but
that is not required either. The check should now ideally be

if (!(clone_flags & CLONE_VM))

>> + if (mm->owner != p)
>> + rcu_assign_pointer(mm->owner, p->group_leader);
>
> Since we just created a fresh mm for p in copy_mm(), and set mm->owner
> = p, I don't see how this test can succeed. And if it does, then we
> end up setting p->mm->owner to p->group_leader, which has to be bad
> since we know that:
>
> - p is the only user of its mm
> - p != p->group_leader
>
> therefore we're setting p->mm->owner to point to a process that
> doesn't use p->mm. What am I missing?
>
>> >> +/*
>> >> + * Task p is exiting and it owned p, so lets find a new owner for it
>> >> + */
>> >> +static inline int
>> >> +mm_need_new_owner(struct mm_struct *mm, struct task_struct *p)
>> >> +{
>> >> + int ret;
>> >> +
>> >> + rcu_read_lock();
>> >> + ret = (mm && (rcu_dereference(mm->owner) == p) &&
>> >> + (atomic_read(&mm->mm_users) > 1));
>> >> + rcu_read_unlock();
>> >> + return ret;
>> >
>> > The only way that rcu_read_lock() helps here is if mm freeing is
>> > protected by RCU, which I don't think is the case.
>> >
>>
>> rcu_read_lock() also ensures that preemption does not cause us to see incorrect
>> values.
>>
>
> The only thing that RCU is protecting us against here is that if
> mm->owner is non-NULL, the task it points to won't be freed until the
> end of our RCU read section. But since we never dereference mm->owner
> in the RCU read section, that buys us nothing and is just confusing.
>
>> > But as long as p==current, there's no race, since no other process
>> > will re-point mm->owner at themselves, so mm can't go away anyway
>> > since we have a reference to it that we're going to be dropping soon.
>> >
>>
>> mm cannot go away, but mm->owner can be different from current and could be
>> going away.
>
> But that's a problem for mm->owner, not for current.
>
> By this point we've already cleared current->mm, so we're no longer a
> candidate for becoming mm->owner if we're not already the owner.
> Therefore the truth or falsity of (mm->owner == current) can't be
> changed by races - either we are already the owner and no-one but us
> can change mm->owner, or we're not the owner and no one can point
> mm->owner at us.
>
>> >> +
>> >> + if (!mm_need_new_owner(mm, p))
>> >> + return;
>> >> +
>> >> + /*
>> >> + * Search in the children
>> >> + */
>> >> + list_for_each_entry(c, &p->children, sibling) {
>> >> + if (c->mm == p->mm)
>
> Oh, and at the point in exit_mm() when you call
> mm_update_next_owner(), p->mm is NULL, so this will only match against
> tasks that have no mm.
>

Yes.. I think we need to call it earlier.

>> >> + goto assign_new_owner;
>> >> + }
>> >
>> > We need to keep checking mm_need_new_owner() since it can become false
>> > if the only other user of the mm exits at the same time that we do.
>> > (In which case there's nothing to do).
>> >
>>
>> I would rather deal with the case where mm->owner is NULL, rather than keep
>> checking
>
> I mean that it's possible that there's one other task using mm at the
> point when we enter mm_update_next_owner(), but while we're trying to
> find it, it does an exit() or execve() and stops being a user of mm.
> if that happens, we should bail from the search since we no longer
> need to find another user to pass it to (and there is no other user to
> find).
>
>> (since even with constant checking we cannot guarantee that mm->owner
>> will not become NULL)
>>
>>
>> >> + * Search through everything else. We should not get
>> >> + * here often
>> >> + */
>> >> + for_each_process(c) {
>> >> + g = c;
>> >> + do {
>> >> + if (c->mm && (c->mm == p->mm))
>> >> + goto assign_new_owner;
>> >> + } while ((c = next_thread(c)) != g);
>> >> + }
>> >
>> > Is there a reason to not code this as for_each_thread?
>> >
>>
>> Is there a for_each_thread()?
>>
>
> Sorry, I meant do_each_thread()/while_each_thread(). Isn't that
> basically the same thing as what you've explicitly coded here?
>

Yes, it is.

>> >> +assign_new_owner:
>> >> + spin_lock(&mm->owner_lock);
>> >> + rcu_assign_pointer(mm->owner, c);
>> >> + spin_unlock(&mm->owner_lock);
>> >> +}
>> >
>> > This can break if c is also exiting and has passed the call to
>> > mm_update_next_owner() by the time we assign mm->owner. That's why my
>> > original suggested version had a function like:
>> >
>>
>> Won't it better to check for c->flags & PF_EXITING?
>>
>
> Or if c is execing and gives up the mm that way.
>



>> > static inline void try_give_mm_ownership(struct task_struct *task,
>> > struct mm_struct *mm) {
>> > if (task->mm != mm) return;
>> > task_lock(task);
>> > if (task->mm == mm) {
>> > mm->owner = task;
>> > }
>> > task_unlock(task);
>> > }
>> >
>> > i.e. determining that a task is a valid candidate and updating the
>> > owner pointer has to be done in the same critical section.
>> >
>>
>> Let me try and revamp the locking rules and see what that leads to. But, I don't
>> like protecting an mm_struct's member with a task_struct's lock.
>
> We're not protecting mm->owner with task_lock(new_owner). As I said,
> the locking rule for mm->owner should that it can only be changed if
> (current==mm->owner). What we're protecting is new_owner->mm - i.e.
> we're ensuring that we don't break the invariant that (mm->owner->mm
> == mm)
>

But there is no way to guarantee that, what is the new_owner exec's after we've
done the check and assigned. Won't we end up breaking the invariant? How about
we have mm_update_new_owner() call in exec_mmap() as well? That way, we can
still use owner_lock and keep the invariant.


>> >> config CGROUP_MEM_RES_CTLR
>> >> bool "Memory Resource Controller for Control Groups"
>> >> - depends on CGROUPS && RESOURCE_COUNTERS
>> >> + depends on CGROUPS && RESOURCE_COUNTERS && MM_OWNER
>> >
>> > Maybe this should select MM_OWNER rather than depending on it?
>> >
>>
>> I thought of it, but wondered if the user should make an informed choice about
>> MM_OWNER and the overhead it brings along.
>
> I would vote the other way on that - make it clear in the memory
> controller config help that the memory controller can introduce
> overhead, and have it implicitly enable MM_OWNER.
>

That's a possibility

Thanks for your patience and review.

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/