Re: [PATCH] fs, mm: account filp and names caches to kmemcg

From: Michal Hocko
Date: Wed Oct 25 2017 - 10:12:31 EST


On Wed 25-10-17 09:11:51, Johannes Weiner wrote:
> On Wed, Oct 25, 2017 at 09:15:22AM +0200, Michal Hocko wrote:
[...]
> > ... we shouldn't make it more loose though.
>
> Then we can end this discussion right now. I pointed out right from
> the start that the only way to replace -ENOMEM with OOM killing in the
> syscall is to force charges. If we don't, we either deadlock or still
> return -ENOMEM occasionally. Nobody has refuted that this is the case.

Yes this is true. I guess we are back to the non-failing allocations
discussion... Currently we are too ENOMEM happy for memcg !PF paths which
can lead to weird issues Greg has pointed out earlier. Going to opposite
direction to basically never ENOMEM and rather pretend a success (which
allows runaways for extreme setups with no oom eligible tasks) sounds
like going from one extreme to another. This basically means that those
charges will effectively GFP_NOFAIL. Too much to guarantee IMHO.

> > > The current thread can loop in syscall exit until
> > > usage is reconciled (either via reclaim or kill). This seems consistent
> > > with pagefault oom handling and compatible with overcommit use case.
> >
> > But we do not really want to make the syscall exit path any more complex
> > or more expensive than it is. The point is that we shouldn't be afraid
> > about triggering the oom killer from the charge patch because we do have
> > async OOM killer. This is very same with the standard allocator path. So
> > why should be memcg any different?
>
> I have nothing against triggering the OOM killer from the allocation
> path. I am dead-set against making the -ENOMEM return from syscalls
> rare and unpredictable.

Isn't that the case when we put memcg out of the picture already? More
on that below.

> They're a challenge as it is. The only sane options are to stick with
> the status quo,

One thing that really worries me about the current status quo is that
the behavior depends on whether you run under memcg or not. The global
policy is "almost never fail unless something horrible is going on".
But we _do not_ guarantee that ENOMEM stays inside the kernel.

So if we need to do something about that I would think we need an
universal solution rather than something memcg specific. Sure global
ENOMEMs are so rare that nobody will probably trigger those but that is
just a wishful thinking...

So how about we start with a BIG FAT WARNING for the failure case?
Something resembling warn_alloc for the failure case.
---
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5d9323028870..3ba62c73eee5 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1547,9 +1547,14 @@ static bool mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order)
* victim and then we have rely on mem_cgroup_oom_synchronize otherwise
* we would fall back to the global oom killer in pagefault_out_of_memory
*/
- if (!memcg->oom_kill_disable &&
- mem_cgroup_out_of_memory(memcg, mask, order))
- return true;
+ if (!memcg->oom_kill_disable) {
+ if (mem_cgroup_out_of_memory(memcg, mask, order))
+ return true;
+
+ WARN(!current->memcg_may_oom,
+ "Memory cgroup charge failed because of no reclaimable memory! "
+ "This looks like a misconfiguration or a kernel bug.");
+ }

if (!current->memcg_may_oom)
return false;
--
Michal Hocko
SUSE Labs