Re: [RFC PATCH 2/2] mm, oom: do not trigger out_of_memory from the #PF

From: Tetsuo Handa
Date: Mon Jun 12 2017 - 06:48:28 EST


Michal Hocko wrote:
> On Sat 10-06-17 20:57:46, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > And just to clarify a bit. The OOM killer should be invoked whenever
> > > appropriate from the allocation context. If we decide to fail the
> > > allocation in the PF path then we can safely roll back and retry the
> > > whole PF. This has an advantage that any locks held while doing the
> > > allocation will be released and that alone can help to make a further
> > > progress. Moreover we can relax retry-for-ever _inside_ the allocator
> > > semantic for the PF path and fail allocations when we cannot make
> > > further progress even after we hit the OOM condition or we do stall for
> > > too long.
> >
> > What!? Are you saying that leave the allocator loop rather than invoke
> > the OOM killer if it is from page fault event without __GFP_FS set?
> > With below patch applied (i.e. ignore __GFP_FS for emulation purpose),
> > I can trivially observe systemwide lockup where the OOM killer is
> > never called.
>
> Because you have ruled the OOM out of the game completely from the PF
> path AFICS.

Yes, I know.

> So that is clearly _not_ what I meant (read the second
> sentence). What I meant was that page fault allocations _could_ fail
> _after_ we have used _all_ the reclaim opportunities.

I used this patch for demonstrating what will happen if page fault
allocations failed but the OOM killer does not trigger.

> Without this patch
> this would be impossible.

What I wanted to say is that, with this patch, you are introducing possibility
of lockup. "Retrying the whole page fault path when page fault allocations
failed but the OOM killer does not trigger" helps nothing. It will just spin
wasting CPU time until somebody else invokes the OOM killer.

> Note that I am not proposing that change now
> because that would require a deeper audit but it sounds like a viable
> way to go long term.

I don't think introducing possibility of "page fault allocations failed
but the OOM killer does not trigger" makes sense. Thus, this patch does not
make sense unless we invoke the OOM killer before returning VM_FAULT_OOM.