Re: Let's talk about the elephant in the room - the Linux kernel's inability to gracefully handle low memory pressure

From: Artem S. Tashkinov
Date: Mon Aug 05 2019 - 08:02:02 EST


On 8/5/19 9:05 AM, Hillf Danton wrote:

On Sun, 4 Aug 2019 09:23:17 +0000 "Artem S. Tashkinov" <aros@xxxxxxx> wrote:
Hello,

There's this bug which has been bugging many people for many years
already and which is reproducible in less than a few minutes under the
latest and greatest kernel, 5.2.6. All the kernel parameters are set to
defaults.

Thanks for report!

Steps to reproduce:

1) Boot with mem=4G
2) Disable swap to make everything faster (sudo swapoff -a)
3) Launch a web browser, e.g. Chrome/Chromium or/and Firefox
4) Start opening tabs in either of them and watch your free RAM decrease

We saw another corner-case cpu hog report under memory pressure also
with swap disabled. In that report the xfs filesystem was an factor
with CONFIG_MEMCG enabled. Anything special, say like

kernel:watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [leaker1:7193]
or
[ 3225.313209] Xorg: page allocation failure: order:4, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0

in your kernel log?

I'm running ext4 only without LVM, encryption or anything like that.
Plain GPT/MBR partitions with plenty of free space and no disk errors.


Once you hit a situation when opening a new tab requires more RAM than
is currently available, the system will stall hard. You will barely be
able to move the mouse pointer. Your disk LED will be flashing
incessantly (I'm not entirely sure why). You will not be able to run new
applications or close currently running ones.

A cpu hog may come on top of memory hog in some scenario.

It might have happened as well - I couldn't know since I wasn't able to
open a terminal. Once the system recovered there was no trace of
anything extraordinary.


This little crisis may continue for minutes or even longer. I think
that's not how the system should behave in this situation. I believe
something must be done about that to avoid this stall.

Yes, Sir.

I'm almost sure some sysctl parameters could be changed to avoid this
situation but something tells me this could be done for everyone and
made default because some non tech-savvy users will just give up on
Linux if they ever get in a situation like this and they won't be keen
or even be able to Google for solutions.

I am not willing to repeat that it is hard to produce a pill for all
patients, but the info you post will help solve the crisis sooner.

Hillf


In case you have troubles reproducing this bug report I can publish a VM
image - still everything is quite mundane: Fedora 30 + XFCE + web
browser. Nothing else, nothing fancy.

Regards,
Artem