[livelock, 3.13.0] livelock when run out of swap space

From: Dave Chinner
Date: Sun Feb 02 2014 - 18:43:05 EST


Hi folks,

I just had a test machine livelock when running a concurrent rm -rf
workload on an XFS filesystem with 64k directory block sizes. The
buffer allocation code started reporting this 5 times a second:

XFS: possible memory allocation deadlock in kmem_alloc (mode:0x8250)

Which is in GFP_NOFS|GFP_ZERO context. It is likely to have been a
high order allocation (up to 64k), but there was still lenty of free
memory available (2.8GB of 16GB):

$ free
total used free shared buffers cached
Mem: 16424296 13593732 2830564 0 136 3184
-/+ buffers/cache: 13590412 2833884
Swap: 497976 497976 0
$

But clearly there was no page cache being used. All of the memory in
use was in the inode/dentry caches:

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
9486678 9483271 99% 1.19K 364874 26 11675968K xfs_inode
4820508 4820508 100% 0.21K 130284 37 1042272K dentry
4820224 4820224 100% 0.06K 75316 64 301264K kmalloc-64

The issue is that memory allocation was not making progress - the
shrinkers we not doing anything because they were under GFP_NOFS
allocation context, and kswapd was never woken to take over. The
system was compeltely out of swap space, and all the CPU was being
burnt in this function:

44.91% [kernel] [k] scan_swap_map

The typical stack trace of a looping memory allocation is this:

[211699.924006] CPU: 2 PID: 21939 Comm: rm Not tainted 3.13.0-dgc+ #172
[211699.924006] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[211699.924006] task: ffff88041a7dde40 ti: ffff8803bbeec000 task.ti: ffff8803bbeec000
[211699.924006] RIP: 0010:[<ffffffff81187dd8>] [<ffffffff81187dd8>] scan_swap_map+0x118/0x520
[211699.924006] RSP: 0018:ffff8803bbeed508 EFLAGS: 00000297
[211699.924006] RAX: 000000000000a1ba RBX: 0000000000000032 RCX: 0000000000000000
[211699.924006] RDX: 0000000000000001 RSI: 000000000001e64e RDI: 0000000000019def
[211699.924006] RBP: ffff8803bbeed558 R08: 00000000002c6ba0 R09: 0000000000000000
[211699.924006] R10: 57ffb90ace3d4f80 R11: 0000000000019def R12: ffff88041a682900
[211699.924006] R13: 01ffffffffffffff R14: 0000000000000040 R15: ffff88041a6829a0
[211699.924006] FS: 00007fae682c8700(0000) GS:ffff88031bc00000(0000) knlGS:0000000000000000
[211699.924006] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[211699.924006] CR2: 00000000004353b0 CR3: 00000002ea498000 CR4: 00000000000006e0
[211699.924006] Stack:
[211699.924006] ffff8803bbeed538 ffff88031a9dc720 000000000000a1ba 0000000000004893
[211699.924006] 0000000000000000 ffff88041a682900 0000000000000000 0000000000000001
[211699.924006] 0000000000000000 ffff88041a6829a0 ffff8803bbeed598 ffffffff8118837f
[211699.924006] Call Trace:
[211699.924006] [<ffffffff8118837f>] get_swap_page+0xef/0x1e0
[211699.924006] [<ffffffff81184e34>] add_to_swap+0x24/0x70
[211699.924006] [<ffffffff8115f110>] shrink_page_list+0x300/0xa20
[211699.924006] [<ffffffff81169089>] ? __mod_zone_page_state+0x49/0x50
[211699.924006] [<ffffffff8116a3b9>] ? wait_iff_congested+0xa9/0x150
[211699.924006] [<ffffffff8115fe03>] shrink_inactive_list+0x243/0x480
[211699.924006] [<ffffffff811606f1>] shrink_lruvec+0x371/0x670
[211699.924006] [<ffffffff81cdb4ce>] ? _raw_spin_unlock+0xe/0x10
[211699.924006] [<ffffffff81160dea>] do_try_to_free_pages+0x11a/0x360
[211699.924006] [<ffffffff81161220>] try_to_free_pages+0x110/0x190
[211699.924006] [<ffffffff81156422>] __alloc_pages_nodemask+0x5a2/0x8a0
[211699.924006] [<ffffffff8118fac2>] alloc_pages_current+0xb2/0x170
[211699.924006] [<ffffffff81151bde>] __get_free_pages+0xe/0x50
[211699.924006] [<ffffffff8116d199>] kmalloc_order_trace+0x39/0xb0
[211699.924006] [<ffffffff810cf4c3>] ? finish_wait+0x63/0x80
[211699.924006] [<ffffffff81197156>] __kmalloc+0x176/0x180
[211699.924006] [<ffffffff810cf520>] ? __init_waitqueue_head+0x40/0x40
[211699.924006] [<ffffffff814a74f7>] kmem_alloc+0x77/0xf0
[211699.924006] [<ffffffff814feb54>] xfs_log_commit_cil+0x3c4/0x5a0
[211699.924006] [<ffffffff814a6be3>] xfs_trans_commit+0xc3/0x2d0
[211699.924006] [<ffffffff814e913e>] xfs_remove+0x3be/0x440
[211699.924006] [<ffffffff811b7d8d>] ? __d_lookup+0x11d/0x170
[211699.924006] [<ffffffff8149b842>] xfs_vn_unlink+0x52/0xa0
[211699.924006] [<ffffffff811acc22>] vfs_unlink+0xf2/0x160
[211699.924006] [<ffffffff811acef0>] do_unlinkat+0x260/0x2a0
[211699.924006] [<ffffffff811b003b>] SyS_unlinkat+0x1b/0x40
[211699.924006] [<ffffffff81ce3ea9>] system_call_fastpath+0x16/0x1b

i.e. trying to do memory allocation during a transaction commit in
XFS, and that is looping in kmem_alloc().

THe problem in this case is that kswapd was not being started to
free slab cache memory (i.e. to handle the defered GFP_NOFS slab
reclaim). It stayed in the livelock state for over an hour before I
broke it by running "echo 2 > /proc/sys/vm/drop_caches" manually.
That immediately freed up the slab cache as reclaim was not under
GFP_NOFS constraints and the livelock went away and the system
started to make progress again.

I haven't seen this problem before, so this may be a regression...

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/