Re: Still OOM problems with 4.9er/4.10er kernels

From: lkml
Date: Thu Mar 16 2017 - 04:48:21 EST

Next message: Daniel Baluta: "Re: [alsa-devel] [RFC PATCH 1/2] ASoC: codec: wm8960: Refactor sysclk freq search"
Previous message: Boris Brezillon: "Re: [PATCH v6 00/17] mtd: nand: allow vendor specific detection/initialization"
In reply to: Michal Hocko: "Re: Still OOM problems with 4.9er/4.10er kernels"
Next in thread: Michal Hocko: "Re: Still OOM problems with 4.9er/4.10er kernels"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Mar 16, 2017 at 09:27:14AM +0100, Michal Hocko wrote:
> On Thu 16-03-17 07:38:08, Gerhard Wiesinger wrote:
> [...]
> > The following commit is included in that version:
> > commit 710531320af876192d76b2c1f68190a1df941b02
> > Author: Michal Hocko <mhocko@xxxxxxxx>
> > Date: Wed Feb 22 15:45:58 2017 -0800
> >
> > mm, vmscan: cleanup lru size claculations
> >
> > commit fd538803731e50367b7c59ce4ad3454426a3d671 upstream.
>
> This patch shouldn't make any difference. It is a cleanup patch.
> I guess you meant 71ab6cfe88dc ("mm, vmscan: consider eligible zones in
> get_scan_count") but even that one shouldn't make any difference for 64b
> systems.
>
> > But still OOMs:
> > [157048.030760] clamscan: page allocation stalls for 19405ms, order:0, mode:0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null)
>
> This is not OOM it is an allocation stall. The allocation request cannot
> simply make forward progress for more than 10s. This alone is bad but
> considering this is GFP_HIGHUSER_MOVABLE which has the full reclaim
> capabilities I would suspect your workload overcommits the available
> memory too much. You only have ~380MB of RAM with ~160MB sitting in the
> anonymous memory, almost nothing in the page cache so I am not wondering
> that you see a constant swap activity. There seems to be only 40M in the
> slab so we are still missing ~180MB which is neither on the LRU lists
> nor allocated by slab. This means that some kernel subsystem allocates
> from the page allocator directly.
>
> That being said, I believe that what you are seeing is not a bug in the
> MM subsystem but rather some susbsytem using more memory than it used to
> before so your workload doesn't fit into the amount of memory you have
> anymore.
>

While on the topic of understanding allocation stalls, Philip Freeman recently
mailed linux-kernel with a similar report, and in his case there are plenty of
page cache pages. It was also a GFP_HIGHUSER_MOVABLE 0-order allocation.

I'm no MM expert, but it appears a bit broken for such a low-order allocation
to stall on the order of 10 seconds when there's plenty of reclaimable pages,
in addition to mostly unused and abundant swap space on SSD.

Regards,
Vito Caputo

Next message: Daniel Baluta: "Re: [alsa-devel] [RFC PATCH 1/2] ASoC: codec: wm8960: Refactor sysclk freq search"
Previous message: Boris Brezillon: "Re: [PATCH v6 00/17] mtd: nand: allow vendor specific detection/initialization"
In reply to: Michal Hocko: "Re: Still OOM problems with 4.9er/4.10er kernels"
Next in thread: Michal Hocko: "Re: Still OOM problems with 4.9er/4.10er kernels"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]