Re: [PATCH 0/2] mm->owner to mm->memcg fixes

From: Michal Hocko
Date: Fri Jun 01 2018 - 10:07:12 EST


On Thu 31-05-18 13:41:38, Eric W. Biederman wrote:
> Michal Hocko <mhocko@xxxxxxxxxx> writes:
>
> > On Thu 24-05-18 14:16:35, Andrew Morton wrote:
> >> On Thu, 24 May 2018 13:10:02 +0200 Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> >>
> >> > I would really prefer and appreciate a repost with all the fixes folded
> >> > in.
> >>
> >> [1/2]
> >
> > Thanks Andrew and sorry it took so long! This seems to be missing the
> > fix for the issue I've mentioned in http://lkml.kernel.org/r/20180510121838.GE5325@xxxxxxxxxxxxxx
> > and Eric wanted to handle by http://lkml.kernel.org/r/87wovu889o.fsf@xxxxxxxxxxxxx
> > I do not think that this fix is really correct one though. I will
> > comment in a reply to that email.
>
> Agreed. That was not a correct fix to the possible infinite loop in
> get_mem_cgroup_from_mm. Which in net leaves all of this broken and
> not ready to be merged.
>
> > In any case I really think this should better be reposted in one patch
> > series and restart the discussion. I strongly suggest revisiting my
> > previous attempt http://lkml.kernel.org/r/1436358472-29137-8-git-send-email-mhocko@xxxxxxxxxx
> > including the follow up discussion regarding the unfortunate CLONE_VM
> > outside of thread group case and potential solution for that.
>
> With my fix for get_mem_cgroup_from_mm falling flat that limits our
> possible future directions:
> - Make memory cgroups honest and require all tasks of an mm to be in the
> same memory cgroup. [BEST]

Agreed. This should be possible with the multi-pid migration support.
Just do not allow to migrate if the set of pids doesn't contain all
processes sharing the same mm. We can add a special MMF_ flag for mm
that is generated by CLONE_VM without CLONE_THREAD to make the normal
path fast.

Or we can simply fail the migration for this odd cases and implement it
only if somebody actually complains. The flag would be useful for other
cases (e.g. in the oom path where we have to handle these tasks as well).

[...]
> >> - /* We move charges only when we move a owner of the mm */
> >> - if (mm->owner == p) {
> c>> +
> >> + /* We move charges except for creative uses of CLONE_VM */
> >> + if (mm->memcg == from) {
> >
> > I must be missing something here but how do you prevent those cases?
> > AFAICS processes sharing the mm will simply allow to migrate.
>
> The mm will only migrate once per set of tasks. A task that does not
> migrate with the mm will not have mm->memcg == from. Which is all I was
> referring to. Perhaps I did not say that well.

OK, I see now (I guess). I would probably just drop the comment. The
code is much more clear when dealing with the memcg than a task...

> >> VM_BUG_ON(mc.from);
> >> VM_BUG_ON(mc.to);
> >> VM_BUG_ON(mc.precharge);
> >
> > Other than that the patch makes sense to me.
>
> I will take a bit and respin this. Because of algorithmic bug-fix
> nature of where this started I want to avoid depending on fixing the
> semantics. With my third option for fixing get_mem_cgroup_from_mm I see
> how to do that now. Then I will include a separate patch to fix the
> semantics.

Thanks!
--
Michal Hocko
SUSE Labs