Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages per zone

From: Tetsuo Handa
Date: Sun Feb 26 2017 - 01:37:08 EST


Michal Hocko wrote:
> On Wed 22-02-17 11:02:21, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Tue 21-02-17 23:35:07, Tetsuo Handa wrote:
> > > > Michal Hocko wrote:
> > > > > OK, so it seems that all the distractions are handled now and linux-next
> > > > > should provide a reasonable base for testing. You said you weren't able
> > > > > to reproduce the original long stalls on too_many_isolated(). I would be
> > > > > still interested to see those oom reports and potential anomalies in the
> > > > > isolated counts before I send the patch for inclusion so your further
> > > > > testing would be more than appreciated. Also stalls > 10s without any
> > > > > previous occurrences would be interesting.
> > > >
> > > > I confirmed that linux-next-20170221 with kmallocwd applied can reproduce
> > > > infinite too_many_isolated() loop problem. Please send your patches to linux-next.
> > >
> > > So I assume that you didn't see the lockup with the patch applied and
> > > the OOM killer has resolved the situation by killing other tasks, right?
> > > Can I assume your Tested-by?
> >
> > No. I tested linux-next-20170221 which does not include your patch.
> > I didn't test linux-next-20170221 with your patch applied. Your patch will
> > avoid infinite too_many_isolated() loop problem in shrink_inactive_list().
> > But we need to test different workloads by other people. Thus, I suggest
> > you to send your patches to linux-next without my testing.
>
> I will send the patch to Andrew later after merge window closes. It
> would be really helpful, though, to see how it handles your workload
> which is known to reproduce the oom starvation.

I tested http://lkml.kernel.org/r/20170119112336.GN30786@xxxxxxxxxxxxxx
on top of linux-next-20170221 with kmallocwd applied.

I did not hit too_many_isolated() loop problem. But I hit an "unable to invoke
the OOM killer due to !__GFP_FS allocation" lockup problem shown below.

Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20170226.txt.xz .
----------
[ 444.281177] Killed process 9477 (a.out) total-vm:4168kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
[ 444.287046] oom_reaper: reaped process 9477 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 484.810225] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 38s!
[ 484.812907] BUG: workqueue lockup - pool cpus=2 node=0 flags=0x0 nice=0 stuck for 41s!
[ 484.815546] Showing busy workqueues and worker pools:
[ 484.817595] workqueue events: flags=0x0
[ 484.819456] pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=3/256
[ 484.821666] pending: vmpressure_work_fn, vmstat_shepherd, vmw_fb_dirty_flush [vmwgfx]
[ 484.824356] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=2/256
[ 484.826582] pending: drain_local_pages_wq BAR(9595), e1000_watchdog [e1000]
[ 484.829091] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
[ 484.831325] in-flight: 7418:rht_deferred_worker
[ 484.833336] pending: rht_deferred_worker
[ 484.835346] workqueue events_long: flags=0x0
[ 484.837343] pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256
[ 484.839566] pending: gc_worker [nf_conntrack]
[ 484.841691] workqueue events_power_efficient: flags=0x80
[ 484.843873] pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256
[ 484.846103] pending: fb_flashcursor
[ 484.847928] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
[ 484.850149] pending: neigh_periodic_work, neigh_periodic_work
[ 484.852403] workqueue events_freezable_power_: flags=0x84
[ 484.854534] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
[ 484.856666] in-flight: 27:disk_events_workfn
[ 484.858621] workqueue writeback: flags=0x4e
[ 484.860347] pwq 256: cpus=0-127 flags=0x4 nice=0 active=2/256
[ 484.862415] in-flight: 8444:wb_workfn wb_workfn
[ 484.864602] workqueue vmstat: flags=0xc
[ 484.866291] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
[ 484.868307] pending: vmstat_update
[ 484.869876] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[ 484.871864] pending: vmstat_update
[ 484.874058] workqueue mpt_poll_0: flags=0x8
[ 484.875698] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
[ 484.877602] pending: mpt_fault_reset_work [mptbase]
[ 484.879502] workqueue xfs-buf/sda1: flags=0xc
[ 484.881148] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/1
[ 484.883011] pending: xfs_buf_ioend_work [xfs]
[ 484.884706] workqueue xfs-data/sda1: flags=0xc
[ 484.886367] pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=27/256 MAYDAY
[ 484.888410] in-flight: 5356:xfs_end_io [xfs], 451(RESCUER):xfs_end_io [xfs] xfs_end_io [xfs] xfs_end_io [xfs] xfs_end_io [xfs] xfs_end_io [xfs], 10498:xfs_end_io [xfs], 6386:xfs_end_io [xfs]
[ 484.893483] pending: xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs]
[ 484.902636] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=21/256 MAYDAY
[ 484.904848] in-flight: 535:xfs_end_io [xfs], 7416:xfs_end_io [xfs], 7415:xfs_end_io [xfs], 65:xfs_end_io [xfs]
[ 484.907863] pending: xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs]
[ 484.916767] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=4/256 MAYDAY
[ 484.919024] in-flight: 5357:xfs_end_io [xfs], 193:xfs_end_io [xfs], 52:xfs_end_io [xfs], 5358:xfs_end_io [xfs]
[ 484.922143] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[ 484.924291] in-flight: 2486:xfs_end_io [xfs]
[ 484.926248] workqueue xfs-reclaim/sda1: flags=0xc
[ 484.928216] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
[ 484.930362] pending: xfs_reclaim_worker [xfs]
[ 484.932312] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 3 6387
[ 484.934766] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=38s workers=6 manager: 19
[ 484.937206] pool 4: cpus=2 node=0 flags=0x0 nice=0 hung=41s workers=6 manager: 157
[ 484.939629] pool 6: cpus=3 node=0 flags=0x0 nice=0 hung=41s workers=4 manager: 10499
[ 484.942303] pool 256: cpus=0-127 flags=0x4 nice=0 hung=38s workers=3 idle: 425 426
[ 518.090012] MemAlloc-Info: stalling=184 dying=1 exiting=0 victim=1 oom_count=8441307
(...snipped...)
[ 518.900038] MemAlloc: kswapd0(69) flags=0xa40840 switches=23883 uninterruptible
[ 518.902095] kswapd0 D10776 69 2 0x00000000
[ 518.903784] Call Trace:
[ 518.904849] __schedule+0x336/0xe00
[ 518.906118] schedule+0x3d/0x90
[ 518.907314] io_schedule+0x16/0x40
[ 518.908622] __xfs_iflock+0x129/0x140 [xfs]
[ 518.910027] ? autoremove_wake_function+0x60/0x60
[ 518.911559] xfs_reclaim_inode+0x162/0x440 [xfs]
[ 518.913068] xfs_reclaim_inodes_ag+0x2cf/0x4f0 [xfs]
[ 518.914611] ? xfs_reclaim_inodes_ag+0xf2/0x4f0 [xfs]
[ 518.916148] ? trace_hardirqs_on+0xd/0x10
[ 518.917465] ? try_to_wake_up+0x59/0x7a0
[ 518.918758] ? wake_up_process+0x15/0x20
[ 518.920067] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
[ 518.921560] xfs_fs_free_cached_objects+0x19/0x20 [xfs]
[ 518.923114] super_cache_scan+0x181/0x190
[ 518.924435] shrink_slab+0x29f/0x6d0
[ 518.925683] shrink_node+0x2fa/0x310
[ 518.926909] kswapd+0x362/0x9b0
[ 518.928061] kthread+0x10f/0x150
[ 518.929218] ? mem_cgroup_shrink_node+0x3b0/0x3b0
[ 518.930953] ? kthread_create_on_node+0x70/0x70
[ 518.932380] ret_from_fork+0x31/0x40
(...snipped...)
[ 553.070829] MemAlloc-Info: stalling=184 dying=1 exiting=0 victim=1 oom_count=10318507
[ 575.432697] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 129s!
[ 575.435276] BUG: workqueue lockup - pool cpus=2 node=0 flags=0x0 nice=0 stuck for 131s!
[ 575.437863] Showing busy workqueues and worker pools:
[ 575.439837] workqueue events: flags=0x0
[ 575.441605] pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=4/256
[ 575.443717] pending: vmpressure_work_fn, vmstat_shepherd, vmw_fb_dirty_flush [vmwgfx], check_corruption
[ 575.446622] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=2/256
[ 575.448763] pending: drain_local_pages_wq BAR(9595), e1000_watchdog [e1000]
[ 575.451173] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
[ 575.453323] in-flight: 7418:rht_deferred_worker
[ 575.455243] pending: rht_deferred_worker
[ 575.457100] workqueue events_long: flags=0x0
[ 575.458960] pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256
[ 575.461099] pending: gc_worker [nf_conntrack]
[ 575.463043] workqueue events_power_efficient: flags=0x80
[ 575.465110] pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256
[ 575.467252] pending: fb_flashcursor
[ 575.468966] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
[ 575.471109] pending: neigh_periodic_work, neigh_periodic_work
[ 575.473289] workqueue events_freezable_power_: flags=0x84
[ 575.475378] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
[ 575.477526] in-flight: 27:disk_events_workfn
[ 575.479489] workqueue writeback: flags=0x4e
[ 575.481257] pwq 256: cpus=0-127 flags=0x4 nice=0 active=2/256
[ 575.483368] in-flight: 8444:wb_workfn wb_workfn
[ 575.485505] workqueue vmstat: flags=0xc
[ 575.487196] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[ 575.489242] pending: vmstat_update
[ 575.491403] workqueue mpt_poll_0: flags=0x8
[ 575.493106] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
[ 575.495115] pending: mpt_fault_reset_work [mptbase]
[ 575.497086] workqueue xfs-buf/sda1: flags=0xc
[ 575.498764] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/1
[ 575.500654] pending: xfs_buf_ioend_work [xfs]
[ 575.502372] workqueue xfs-data/sda1: flags=0xc
[ 575.504024] pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=27/256 MAYDAY
[ 575.506060] in-flight: 5356:xfs_end_io [xfs], 451(RESCUER):xfs_end_io [xfs] xfs_end_io [xfs] xfs_end_io [xfs] xfs_end_io [xfs] xfs_end_io [xfs], 10498:xfs_end_io [xfs], 6386:xfs_end_io [xfs]
[ 575.511096] pending: xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs]
[ 575.520157] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=21/256 MAYDAY
[ 575.522340] in-flight: 535:xfs_end_io [xfs], 7416:xfs_end_io [xfs], 7415:xfs_end_io [xfs], 65:xfs_end_io [xfs]
[ 575.525387] pending: xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs], xfs_end_io [xfs]
[ 575.534089] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=4/256 MAYDAY
[ 575.536407] in-flight: 5357:xfs_end_io [xfs], 193:xfs_end_io [xfs], 52:xfs_end_io [xfs], 5358:xfs_end_io [xfs]
[ 575.539496] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[ 575.541648] in-flight: 2486:xfs_end_io [xfs]
[ 575.543591] workqueue xfs-reclaim/sda1: flags=0xc
[ 575.545540] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
[ 575.547675] pending: xfs_reclaim_worker [xfs]
[ 575.549719] workqueue xfs-log/sda1: flags=0x1c
[ 575.551591] pwq 3: cpus=1 node=0 flags=0x0 nice=-20 active=1/256
[ 575.553750] pending: xfs_log_worker [xfs]
[ 575.555552] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 3 6387
[ 575.557979] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=129s workers=6 manager: 19
[ 575.560399] pool 4: cpus=2 node=0 flags=0x0 nice=0 hung=131s workers=6 manager: 157
[ 575.562843] pool 6: cpus=3 node=0 flags=0x0 nice=0 hung=132s workers=4 manager: 10499
[ 575.565450] pool 256: cpus=0-127 flags=0x4 nice=0 hung=129s workers=3 idle: 425 426
(...snipped...)
[ 616.394649] MemAlloc-Info: stalling=186 dying=1 exiting=0 victim=1 oom_count=13908219
(...snipped...)
[ 642.266252] MemAlloc-Info: stalling=186 dying=1 exiting=0 victim=1 oom_count=15180673
(...snipped...)
[ 702.412189] MemAlloc-Info: stalling=187 dying=1 exiting=0 victim=1 oom_count=18732529
(...snipped...)
[ 736.787879] MemAlloc-Info: stalling=187 dying=1 exiting=0 victim=1 oom_count=20565244
(...snipped...)
[ 800.715759] MemAlloc-Info: stalling=188 dying=1 exiting=0 victim=1 oom_count=24411576
(...snipped...)
[ 837.571405] MemAlloc-Info: stalling=188 dying=1 exiting=0 victim=1 oom_count=26463562
(...snipped...)
[ 899.021495] MemAlloc-Info: stalling=189 dying=1 exiting=0 victim=1 oom_count=30144879
(...snipped...)
[ 936.282709] MemAlloc-Info: stalling=189 dying=1 exiting=0 victim=1 oom_count=32129234
(...snipped...)
[ 997.328119] MemAlloc-Info: stalling=190 dying=1 exiting=0 victim=1 oom_count=35657983
(...snipped...)
[ 1033.977265] MemAlloc-Info: stalling=190 dying=1 exiting=0 victim=1 oom_count=37659912
(...snipped...)
[ 1095.630961] MemAlloc-Info: stalling=190 dying=1 exiting=0 victim=1 oom_count=40639677
(...snipped...)
[ 1095.632984] MemAlloc: kswapd0(69) flags=0xa40840 switches=23883 uninterruptible
[ 1095.632985] kswapd0 D10776 69 2 0x00000000
[ 1095.632988] Call Trace:
[ 1095.632991] __schedule+0x336/0xe00
[ 1095.632994] schedule+0x3d/0x90
[ 1095.632996] io_schedule+0x16/0x40
[ 1095.633017] __xfs_iflock+0x129/0x140 [xfs]
[ 1095.633021] ? autoremove_wake_function+0x60/0x60
[ 1095.633051] xfs_reclaim_inode+0x162/0x440 [xfs]
[ 1095.633072] xfs_reclaim_inodes_ag+0x2cf/0x4f0 [xfs]
[ 1095.633106] ? xfs_reclaim_inodes_ag+0xf2/0x4f0 [xfs]
[ 1095.633114] ? trace_hardirqs_on+0xd/0x10
[ 1095.633116] ? try_to_wake_up+0x59/0x7a0
[ 1095.633120] ? wake_up_process+0x15/0x20
[ 1095.633156] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
[ 1095.633178] xfs_fs_free_cached_objects+0x19/0x20 [xfs]
[ 1095.633180] super_cache_scan+0x181/0x190
[ 1095.633183] shrink_slab+0x29f/0x6d0
[ 1095.633189] shrink_node+0x2fa/0x310
[ 1095.633193] kswapd+0x362/0x9b0
[ 1095.633200] kthread+0x10f/0x150
[ 1095.633201] ? mem_cgroup_shrink_node+0x3b0/0x3b0
[ 1095.633202] ? kthread_create_on_node+0x70/0x70
[ 1095.633205] ret_from_fork+0x31/0x40
(...snipped...)
[ 1095.821248] MemAlloc-Info: stalling=190 dying=1 exiting=0 victim=1 oom_count=40646791
(...snipped...)
[ 1125.236970] sysrq: SysRq : Resetting
[ 1125.238669] ACPI MEMORY or I/O RESET_REG.
----------

The switches= value (which is "struct task_struct"->nvcsw +
"struct task_struct"->nivcsw ) of kswapd0(69) remained 23883 which means that
kswapd0 was waiting forever at

----------
void
__xfs_iflock(
struct xfs_inode *ip)
{
wait_queue_head_t *wq = bit_waitqueue(&ip->i_flags, __XFS_IFLOCK_BIT);
DEFINE_WAIT_BIT(wait, &ip->i_flags, __XFS_IFLOCK_BIT);

do {
prepare_to_wait_exclusive(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
if (xfs_isiflocked(ip))
io_schedule(); /***** <= This location. *****/
} while (!xfs_iflock_nowait(ip));

finish_wait(wq, &wait.wait);
}
----------

while the oom_count= value (which is number of times out_of_memory() was called)
was increasing over time without emitting "Killed process " message.

Reproducer I used is shown below.

----------
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <signal.h>
#include <poll.h>

static char use_delay = 0;

static void sigcld_handler(int unused)
{
use_delay = 1;
}

int main(int argc, char *argv[])
{
static char buffer[4096] = { };
char *buf = NULL;
unsigned long size;
int i;
signal(SIGCLD, sigcld_handler);
for (i = 0; i < 1024; i++) {
if (fork() == 0) {
int fd = open("/proc/self/oom_score_adj", O_WRONLY);
write(fd, "1000", 4);
close(fd);
sleep(1);
if (!i)
pause();
snprintf(buffer, sizeof(buffer), "/tmp/file.%u", getpid());
fd = open(buffer, O_WRONLY | O_CREAT | O_APPEND, 0600);
while (write(fd, buffer, sizeof(buffer)) == sizeof(buffer)) {
poll(NULL, 0, 10);
fsync(fd);
}
_exit(0);
}
}
for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) {
char *cp = realloc(buf, size);
if (!cp) {
size >>= 1;
break;
}
buf = cp;
}
sleep(2);
/* Will cause OOM due to overcommit */
for (i = 0; i < size; i += 4096)
buf[i] = 0;
pause();
return 0;
}
----------