Re: [patch -mm 4/9 v2] oom: remove compulsory panic_on_oom mode

From: KAMEZAWA Hiroyuki
Date: Tue Feb 16 2010 - 22:24:48 EST


On Tue, 16 Feb 2010 18:58:17 -0800 (PST)
David Rientjes <rientjes@xxxxxxxxxx> wrote:

> On Wed, 17 Feb 2010, KAMEZAWA Hiroyuki wrote:
>
> > > We want to lock all populated zones with ZONE_OOM_LOCKED to avoid
> > > needlessly killing more than one task regardless of how many memcgs are
> > > oom.
> > >
> > Current implentation archive what memcg want. Why remove and destroy memcg ?
> >
>
> I've updated my patch to not take ZONE_OOM_LOCKED for any zones on memcg
> oom. I'm hoping that you will add sysctl_panic_on_oom == 2 for this case
> later, however.
>
I'll write panic_on_oom for memcg, later.

> > What I mean is
> > - What VM_FAULT_OOM means is not "memory is exhausted" but "something is exhausted".
> >
> > For example, when hugepages are all used, it may return VM_FAULT_OOM.
> > Especially when nr_overcommit_hugepage == usage_of_hugepage, it returns VM_FAULT_OOM.
> >
>
> The hugetlb case seems to be the only misuse of VM_FAULT_OOM where it
> doesn't mean we simply don't have the memory to handle the page fault,
> i.e. your earlier "memory is exhausted" definition. That was handled well
> before calling out_of_memory() by simply killing current since we know it
> is faulting hugetlb pages and its resource is limited.
>
> We could pass the vma to pagefault_out_of_memory() and simply kill current
> if its killable and is_vm_hugetlb_page(vma).
>

No. hugepage is not only case.
You may not read but we annoyed i915's driver bug recently and it was clearly
misuse of VM_FAULT_OOM. Then, we got many reports of OOM killer in these months.
(thanks to Kosaki about this.)

quick glance around core codes...
- HUGEPAGE at el. should return some VM_FAULT_NO_RESOUECE rather than VM_FAULT_OOM.
- filemap.c's VM_FAULT_OOM shoudn't call page_fault_oom_kill because it has already
called oom_killer if it can.
- about relayfs, is VM_FAULT_OOM should be BUG_ON()...
- filemap_xip.c return VM_FAULT_OOM....but it doesn't seem to be OOM..
just like VM_FAULT_NO_VALID_PAGE_FOUND. (But I'm not familiar with this area.)
- fs/buffer.c 's VM_FAULT_OOM is returned oom-killer is called.
- shmem.c's VM_FAULT_OOM is retuned oom-killer is called.

i915's VM_FAULT_OOM is miterious but I can't find whether its real OOM or just shortage
of is own resource. I think VM_FAULT_NO_RESOUCE should be added.


Thanks,
-Kame


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/