Re: [RFC PATCH] mm, oom: disable dump_tasks by default

From: Qian Cai
Date: Thu Sep 05 2019 - 12:10:59 EST


On Tue, 2019-09-03 at 17:13 +0200, Michal Hocko wrote:
> On Tue 03-09-19 11:02:46, Qian Cai wrote:
> > On Tue, 2019-09-03 at 16:45 +0200, Michal Hocko wrote:
> > > From: Michal Hocko <mhocko@xxxxxxxx>
> > >
> > > dump_tasks has been introduced by quite some time ago fef1bdd68c81
> > > ("oom: add sysctl to enable task memory dump"). It's primary purpose is
> > > to help analyse oom victim selection decision. This has been certainly
> > > useful at times when the heuristic to chose a victim was much more
> > > volatile. Since a63d83f427fb ("oom: badness heuristic rewrite")
> > > situation became much more stable (mostly because the only selection
> > > criterion is the memory usage) and reports about a wrong process to
> > > be shot down have become effectively non-existent.
> >
> > Well, I still see OOM sometimes kills wrong processes like ssh, systemd
> > processes while LTP OOM tests with staight-forward allocation patterns.
>
> Please report those. Most cases I have seen so far just turned out to
> work as expected and memory hogs just used oom_score_adj or similar.

Here is the one where oom01 should be one to be killed.

[92598.855697][ T2588] Swap cache stats: add 105240923, delete 105250445, find
42196/101577
[92598.893970][ T2588] Free swapÂÂ= 16383612kB
[92598.913482][ T2588] Total swap = 16465916kB
[92598.932938][ T2588] 7275091 pages RAM
[92598.950212][ T2588] 0 pages HighMem/MovableOnly
[92598.971539][ T2588] 1315554 pages reserved
[92598.990698][ T2588] 16384 pages cma reserved
[92599.010760][ T2588] Tasks state (memory values in pages):
[92599.036265][ T2588] [ÂÂpidÂÂ]ÂÂÂuidÂÂtgid total_vmÂÂÂÂÂÂrss pgtables_bytes
swapents oom_score_adj name
[92599.080129][ T2588]
[ÂÂÂ1662]ÂÂÂÂÂ0ÂÂ1662ÂÂÂÂ29511ÂÂÂÂÂ1034ÂÂÂ290816ÂÂÂÂÂÂ244ÂÂÂÂÂÂÂÂÂÂÂÂÂ0 systemd-
journal
[92599.126163][ T2588]
[ÂÂÂ2586]ÂÂÂ998ÂÂ2586ÂÂÂ508086ÂÂÂÂÂÂÂÂ0ÂÂÂ368640ÂÂÂÂÂ1838ÂÂÂÂÂÂÂÂÂÂÂÂÂ0 polkitd
[92599.168706][ T2588]
[ÂÂÂ2587]ÂÂÂÂÂ0ÂÂ2587ÂÂÂÂ52786ÂÂÂÂÂÂÂÂ0ÂÂÂ421888ÂÂÂÂÂÂ500ÂÂÂÂÂÂÂÂÂÂÂÂÂ0 sssd
[92599.210082][ T2588]
[ÂÂÂ2588]ÂÂÂÂÂ0ÂÂ2588ÂÂÂÂ31223ÂÂÂÂÂÂÂÂ0ÂÂÂ139264ÂÂÂÂÂÂ195ÂÂÂÂÂÂÂÂÂÂÂÂÂ0
irqbalance
[92599.255606][ T2588]
[ÂÂÂ2589]ÂÂÂÂ81ÂÂ2589ÂÂÂÂ18381ÂÂÂÂÂÂÂÂ0ÂÂÂ167936ÂÂÂÂÂÂ217ÂÂÂÂÂÂÂÂÂÂ-900 dbus-
daemon
[92599.303678][ T2588]
[ÂÂÂ2590]ÂÂÂÂÂ0ÂÂ2590ÂÂÂÂ97260ÂÂÂÂÂÂ193ÂÂÂ372736ÂÂÂÂÂÂ573ÂÂÂÂÂÂÂÂÂÂÂÂÂ0
NetworkManager
[92599.348957][ T2588]
[ÂÂÂ2594]ÂÂÂÂÂ0ÂÂ2594ÂÂÂÂ95350ÂÂÂÂÂÂÂÂ1ÂÂÂ229376ÂÂÂÂÂÂ758ÂÂÂÂÂÂÂÂÂÂÂÂÂ0 rngd
[92599.390216][ T2588]
[ÂÂÂ2598]ÂÂÂ995ÂÂ2598ÂÂÂÂÂ7364ÂÂÂÂÂÂÂÂ0ÂÂÂÂ94208ÂÂÂÂÂÂ103ÂÂÂÂÂÂÂÂÂÂÂÂÂ0 chronyd
[92599.432447][ T2588]
[ÂÂÂ2629]ÂÂÂÂÂ0ÂÂ2629ÂÂÂ106234ÂÂÂÂÂÂ399ÂÂÂ442368ÂÂÂÂÂ3836ÂÂÂÂÂÂÂÂÂÂÂÂÂ0 tuned
[92599.473950][ T2588]
[ÂÂÂ2638]ÂÂÂÂÂ0ÂÂ2638ÂÂÂÂ23604ÂÂÂÂÂÂÂÂ0ÂÂÂ212992ÂÂÂÂÂÂ240ÂÂÂÂÂÂÂÂÂ-1000 sshd
[92599.515158][ T2588]
[ÂÂÂ2642]ÂÂÂÂÂ0ÂÂ2642ÂÂÂÂ10392ÂÂÂÂÂÂÂÂ0ÂÂÂ102400ÂÂÂÂÂÂ138ÂÂÂÂÂÂÂÂÂÂÂÂÂ0
rhsmcertd
[92599.560435][ T2588]
[ÂÂÂ2691]ÂÂÂÂÂ0ÂÂ2691ÂÂÂÂ21877ÂÂÂÂÂÂÂÂ0ÂÂÂ208896ÂÂÂÂÂÂ277ÂÂÂÂÂÂÂÂÂÂÂÂÂ0 systemd-
logind
[92599.605035][ T2588]
[ÂÂÂ2700]ÂÂÂÂÂ0ÂÂ2700ÂÂÂÂÂ3916ÂÂÂÂÂÂÂÂ0ÂÂÂÂ69632ÂÂÂÂÂÂÂ45ÂÂÂÂÂÂÂÂÂÂÂÂÂ0 agetty
[92599.646750][ T2588]
[ÂÂÂ2705]ÂÂÂÂÂ0ÂÂ2705ÂÂÂÂ23370ÂÂÂÂÂÂÂÂ0ÂÂÂ225280ÂÂÂÂÂÂ393ÂÂÂÂÂÂÂÂÂÂÂÂÂ0 systemd
[92599.688063][ T2588]
[ÂÂÂ2730]ÂÂÂÂÂ0ÂÂ2730ÂÂÂÂ37063ÂÂÂÂÂÂÂÂ0ÂÂÂ294912ÂÂÂÂÂÂ667ÂÂÂÂÂÂÂÂÂÂÂÂÂ0 (sd-pam)
[92599.729028][ T2588]
[ÂÂÂ2922]ÂÂÂÂÂ0ÂÂ2922ÂÂÂÂÂ9020ÂÂÂÂÂÂÂÂ0ÂÂÂÂ98304ÂÂÂÂÂÂ232ÂÂÂÂÂÂÂÂÂÂÂÂÂ0 crond
[92599.769130][ T2588]
[ÂÂÂ3036]ÂÂÂÂÂ0ÂÂ3036ÂÂÂÂ37797ÂÂÂÂÂÂÂÂ1ÂÂÂ307200ÂÂÂÂÂÂ305ÂÂÂÂÂÂÂÂÂÂÂÂÂ0 sshd
[92599.813768][ T2588]
[ÂÂÂ3057]ÂÂÂÂÂ0ÂÂ3057ÂÂÂÂ37797ÂÂÂÂÂÂÂÂ0ÂÂÂ303104ÂÂÂÂÂÂ335ÂÂÂÂÂÂÂÂÂÂÂÂÂ0 sshd
[92599.853450][ T2588]
[ÂÂÂ3065]ÂÂÂÂÂ0ÂÂ3065ÂÂÂÂÂ6343ÂÂÂÂÂÂÂÂ1ÂÂÂÂ86016ÂÂÂÂÂÂ163ÂÂÂÂÂÂÂÂÂÂÂÂÂ0 bash
[92599.892899][ T2588] [ÂÂ38249]ÂÂÂÂÂ0
38249ÂÂÂÂ58330ÂÂÂÂÂÂ293ÂÂÂ221184ÂÂÂÂÂÂ246ÂÂÂÂÂÂÂÂÂÂÂÂÂ0 rsyslogd
[92599.934457][ T2588] [ÂÂ11329]ÂÂÂÂÂ0
11329ÂÂÂÂ55131ÂÂÂÂÂÂÂ73ÂÂÂ454656ÂÂÂÂÂÂ396ÂÂÂÂÂÂÂÂÂÂÂÂÂ0 sssd_nss
[92599.976240][ T2588] [ÂÂ11331]ÂÂÂÂÂ0
11331ÂÂÂÂ54424ÂÂÂÂÂÂÂÂ1ÂÂÂ434176ÂÂÂÂÂÂ610ÂÂÂÂÂÂÂÂÂÂÂÂÂ0 sssd_be
[92600.017106][ T2588] [ÂÂ25247]ÂÂÂÂÂ0
25247ÂÂÂÂ25746ÂÂÂÂÂÂÂÂ1ÂÂÂ212992ÂÂÂÂÂÂ300ÂÂÂÂÂÂÂÂÂ-1000 systemd-udevd
[92600.060539][ T2588] [ÂÂ25391]ÂÂÂÂÂ0
25391ÂÂÂÂÂ2184ÂÂÂÂÂÂÂÂ0ÂÂÂÂ65536ÂÂÂÂÂÂÂ32ÂÂÂÂÂÂÂÂÂÂÂÂÂ0 oom01
[92600.100648][ T2588] [ÂÂ25392]ÂÂÂÂÂ0
25392ÂÂÂÂÂ2184ÂÂÂÂÂÂÂÂ0ÂÂÂÂ65536ÂÂÂÂÂÂÂ39ÂÂÂÂÂÂÂÂÂÂÂÂÂ0 oom01
[92600.143516][ T2588] oom-
kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-
1,global_oom,task_memcg=/system.slice/tuned.service,task=tuned,pid=2629,uid=0
[92600.213724][ T2588] Out of memory: Killed process 2629 (tuned) total-
vm:424936kB, anon-rss:328kB, file-rss:1268kB, shmem-rss:0kB, UID:0
pgtables:442368kB oom_score_adj:0
[92600.297832][ÂÂT305] oom_reaper: reaped process 2629 (tuned), now anon-
rss:0kB, file-rss:0kB, shmem-rss:0kB


>
> > I just
> > have not had a chance to debug them fully. The situation could be worse with
> > more complex allocations like random stress or fuzzy testing.