Re: [PATCH -v2 -mm] memcg: do not trigger OOM fromadd_to_page_cache_locked

From: Johannes Weiner
Date: Wed Nov 28 2012 - 13:44:37 EST


On Wed, Nov 28, 2012 at 05:48:24PM +0100, Michal Hocko wrote:
> On Wed 28-11-12 17:46:40, Michal Hocko wrote:
> > On Wed 28-11-12 11:37:36, Johannes Weiner wrote:
> > > On Wed, Nov 28, 2012 at 05:04:47PM +0100, Michal Hocko wrote:
> > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > > > index 095d2b4..5abe441 100644
> > > > --- a/include/linux/memcontrol.h
> > > > +++ b/include/linux/memcontrol.h
> > > > @@ -57,13 +57,14 @@ extern int mem_cgroup_newpage_charge(struct page *page, struct mm_struct *mm,
> > > > gfp_t gfp_mask);
> > > > /* for swap handling */
> > > > extern int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
> > > > - struct page *page, gfp_t mask, struct mem_cgroup **memcgp);
> > > > + struct page *page, gfp_t mask, struct mem_cgroup **memcgp,
> > > > + bool oom);
> > >
> > > Ok, now I feel almost bad for asking, but why the public interface,
> > > too?
> >
> > Would it work out if I tell it was to double check that your review
> > quality is not decreased after that many revisions? :P

Deal.

> >From e21bb704947e9a477ec1df9121575c606dbfcb52 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@xxxxxxx>
> Date: Wed, 28 Nov 2012 17:46:32 +0100
> Subject: [PATCH] memcg: do not trigger OOM from add_to_page_cache_locked
>
> memcg oom killer might deadlock if the process which falls down to
> mem_cgroup_handle_oom holds a lock which prevents other task to
> terminate because it is blocked on the very same lock.
> This can happen when a write system call needs to allocate a page but
> the allocation hits the memcg hard limit and there is nothing to reclaim
> (e.g. there is no swap or swap limit is hit as well and all cache pages
> have been reclaimed already) and the process selected by memcg OOM
> killer is blocked on i_mutex on the same inode (e.g. truncate it).
>
> Process A
> [<ffffffff811109b8>] do_truncate+0x58/0xa0 # takes i_mutex
> [<ffffffff81121c90>] do_last+0x250/0xa30
> [<ffffffff81122547>] path_openat+0xd7/0x440
> [<ffffffff811229c9>] do_filp_open+0x49/0xa0
> [<ffffffff8110f7d6>] do_sys_open+0x106/0x240
> [<ffffffff8110f950>] sys_open+0x20/0x30
> [<ffffffff815b5926>] system_call_fastpath+0x18/0x1d
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> Process B
> [<ffffffff8110a9c1>] mem_cgroup_handle_oom+0x241/0x3b0
> [<ffffffff8110b5ab>] T.1146+0x5ab/0x5c0
> [<ffffffff8110c22e>] mem_cgroup_cache_charge+0xbe/0xe0
> [<ffffffff810ca28c>] add_to_page_cache_locked+0x4c/0x140
> [<ffffffff810ca3a2>] add_to_page_cache_lru+0x22/0x50
> [<ffffffff810ca45b>] grab_cache_page_write_begin+0x8b/0xe0
> [<ffffffff81193a18>] ext3_write_begin+0x88/0x270
> [<ffffffff810c8fc6>] generic_file_buffered_write+0x116/0x290
> [<ffffffff810cb3cc>] __generic_file_aio_write+0x27c/0x480
> [<ffffffff810cb646>] generic_file_aio_write+0x76/0xf0 # takes ->i_mutex
> [<ffffffff8111156a>] do_sync_write+0xea/0x130
> [<ffffffff81112183>] vfs_write+0xf3/0x1f0
> [<ffffffff81112381>] sys_write+0x51/0x90
> [<ffffffff815b5926>] system_call_fastpath+0x18/0x1d
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> This is not a hard deadlock though because administrator can still
> intervene and increase the limit on the group which helps the writer to
> finish the allocation and release the lock.
>
> This patch heals the problem by forbidding OOM from page cache charges
> (namely add_ro_page_cache_locked). mem_cgroup_cache_charge grows oom
> argument which is pushed down the call chain.
>
> As a possibly visible result add_to_page_cache_lru might fail more often
> with ENOMEM but this is to be expected if the limit is set and it is
> preferable than OOM killer IMO.
>
> Changes since v1
> - do not abuse gfp_flags and rather use oom parameter directly as per
> Johannes
> - handle also shmem write fauls resp. fallocate properly as per Johannes
>
> Reported-by: azurIt <azurit@xxxxxxxx>
> Signed-off-by: Michal Hocko <mhocko@xxxxxxx>

Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>

Thanks, Michal!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/