[3.2-rc2] loop device balance_dirty_pages_nr throttling hang

From: Dave Chinner
Date: Mon Nov 21 2011 - 09:21:02 EST


Hi Fengguang,

I just found a way of hanging a system and taking it down. I haven't
tried to narrow down the test case - it's pretty simple - because it
time for sleep here.

$ uname -a
Linux test-2 3.2.0-rc2-dgc+ #89 SMP Thu Nov 17 15:25:19 EST 2011 x86_64 GNU/Linux

Create a 20TB sparse loop device on 11GB XFS filesystem:

$ sudo xfs_io -f -c "truncate 20T" /mnt/scratch/scratch.img
$ sudo losetup /dev/loop0 /mnt/scratch/scratch.img

Make an ext4 filesystem on the loop device:

$ sudo mkfs.ext4 /dev/loop0
mke2fs 1.42-WIP (16-Oct-2011)
Discarding device blocks: done
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
335544320 inodes, 5368709120 blocks
268435456 blocks (5.00%) reserved for the super user
.....

Mount the filesystem:

$ sudo mount /dev/loop0 /mnt/scratch/scratch

Try to preallocate a 15TB file:

$ sudo time xfs_io -f -F -c "truncate 15T " -c "falloc 0 15T" /mnt/scratch/scratch/foo

The command hangs very soon after this, with these hung processes:

[ 283.437409] SysRq : Show Blocked State
[ 283.438563] task PC stack pid father
[ 283.439837] rsyslogd D ffff88007691ea48 0 1633 1 0x00000000
[ 283.440038] ffff880076a71bf8 0000000000000082 0000000000009f1c ffffffffb135a68d
[ 283.440038] ffff88007691e6c0 ffff880076a71fd8 ffff880076a71fd8 ffff880076a71fd8
[ 283.440038] ffff880031402540 ffff88007691e6c0 ffff880076a71bf8 ffffffff810b59ed
[ 283.440038] Call Trace:
[ 283.440038] [<ffffffff810b59ed>] ? ktime_get_ts+0xad/0xe0
[ 283.440038] [<ffffffff81117650>] ? sleep_on_page+0x20/0x20
[ 283.440038] [<ffffffff81aab3af>] schedule+0x3f/0x60
[ 283.440038] [<ffffffff81aab45f>] io_schedule+0x8f/0xd0
[ 283.440038] [<ffffffff8111765e>] sleep_on_page_killable+0xe/0x40
[ 283.440038] [<ffffffff81aabc2f>] __wait_on_bit+0x5f/0x90
[ 283.440038] [<ffffffff8114c0ff>] ? read_swap_cache_async+0x4f/0x160
[ 283.440038] [<ffffffff81119d60>] wait_on_page_bit_killable+0x80/0x90
[ 283.440038] [<ffffffff810ac360>] ? autoremove_wake_function+0x40/0x40
[ 283.440038] [<ffffffff81119e16>] __lock_page_or_retry+0xa6/0xd0
[ 283.440038] [<ffffffff8113bba7>] handle_pte_fault+0x887/0x8b0
[ 283.440038] [<ffffffff8169ee6d>] ? copy_user_generic_string+0x2d/0x40
[ 283.440038] [<ffffffff8113bee5>] handle_mm_fault+0x155/0x250
[ 283.440038] [<ffffffff81ab0a62>] do_page_fault+0x142/0x4f0
[ 283.440038] [<ffffffff81ab0565>] do_async_page_fault+0x35/0x80
[ 283.440038] [<ffffffff81aad935>] async_page_fault+0x25/0x30
[ 283.440038] loop0 D ffff8800775fe4c8 0 2658 2 0x00000000
[ 283.440038] ffff88007b6b7810 0000000000000046 ffff88007b6b77f0 ffff88007fd12580
[ 283.440038] ffff8800775fe140 ffff88007b6b7fd8 ffff88007b6b7fd8 ffff88007b6b7fd8
[ 283.440038] ffff88007b1ea740 ffff8800775fe140 ffff88007b6b77f0 ffffffff81aad47e
[ 283.440038] Call Trace:
[ 283.440038] [<ffffffff81aad47e>] ? _raw_spin_lock_irqsave+0x2e/0x40
[ 283.440038] [<ffffffff81aab3af>] schedule+0x3f/0x60
[ 283.440038] [<ffffffff81aab894>] schedule_timeout+0x144/0x2d0
[ 283.440038] [<ffffffff810985d0>] ? usleep_range+0x50/0x50
[ 283.440038] [<ffffffff81aab6c2>] io_schedule_timeout+0xa2/0x100
[ 283.440038] [<ffffffff81121b21>] ? bdi_dirty_limit+0x31/0xc0
[ 283.440038] [<ffffffff811221f8>] balance_dirty_pages_ratelimited_nr+0x298/0x6d0
[ 283.440038] [<ffffffff81190d94>] ? block_write_end+0x44/0x80
[ 283.440038] [<ffffffff81117bc8>] generic_file_buffered_write+0x1a8/0x250
[ 283.440038] [<ffffffff8142262c>] xfs_file_buffered_aio_write+0xec/0x1b0
[ 283.440038] [<ffffffff8142285a>] xfs_file_aio_write+0x16a/0x2a0
[ 283.440038] [<ffffffff8115f682>] do_sync_write+0xd2/0x110
[ 283.440038] [<ffffffff81081856>] ? load_balance+0xb6/0x8e0
[ 283.440038] [<ffffffff8184a074>] __do_lo_send_write+0x54/0xa0
[ 283.440038] [<ffffffff8184a3b1>] do_lo_send_direct_write+0x81/0xa0
[ 283.440038] [<ffffffff8184af77>] do_bio_filebacked+0x227/0x2e0
[ 283.440038] [<ffffffff8184a330>] ? transfer_xor+0xf0/0xf0
[ 283.440038] [<ffffffff8119401d>] ? bio_free+0x4d/0x60
[ 283.440038] [<ffffffff81194045>] ? bio_fs_destructor+0x15/0x20
[ 283.440038] [<ffffffff8119346b>] ? bio_put+0x2b/0x30
[ 283.440038] [<ffffffff8184bf42>] loop_thread+0xc2/0x250
[ 283.440038] [<ffffffff810ac320>] ? add_wait_queue+0x60/0x60
[ 283.440038] [<ffffffff8184be80>] ? loop_set_status_old+0x1e0/0x1e0
[ 283.440038] [<ffffffff810ab87c>] kthread+0x8c/0xa0
[ 283.440038] [<ffffffff81ab7174>] kernel_thread_helper+0x4/0x10
[ 283.440038] [<ffffffff810ab7f0>] ? flush_kthread_worker+0xa0/0xa0
[ 283.440038] [<ffffffff81ab7170>] ? gs_change+0x13/0x13
[ 283.440038] jbd2/loop0-8 D ffff88000ca08748 0 2813 2 0x00000000
[ 283.440038] ffff88000c9f7bc0 0000000000000046 0000000000000000 ffffffffb135a68d
[ 283.440038] ffff88000ca083c0 ffff88000c9f7fd8 ffff88000c9f7fd8 ffff88000c9f7fd8
[ 283.440038] ffff88007c8602c0 ffff88000ca083c0 ffff88000c9f7bc0 00000001810b59ed
[ 283.440038] Call Trace:
[ 283.440038] [<ffffffff8118eec0>] ? __wait_on_buffer+0x30/0x30
[ 283.440038] [<ffffffff81aab3af>] schedule+0x3f/0x60
[ 283.440038] [<ffffffff81aab45f>] io_schedule+0x8f/0xd0
[ 283.440038] [<ffffffff8118eece>] sleep_on_buffer+0xe/0x20
[ 283.440038] [<ffffffff81aabc2f>] __wait_on_bit+0x5f/0x90
[ 283.440038] [<ffffffff8167e177>] ? generic_make_request+0xc7/0x100
[ 283.440038] [<ffffffff8118eec0>] ? __wait_on_buffer+0x30/0x30
[ 283.440038] [<ffffffff81aabcdc>] out_of_line_wait_on_bit+0x7c/0x90
[ 283.440038] [<ffffffff810ac360>] ? autoremove_wake_function+0x40/0x40
[ 283.440038] [<ffffffff8118eebe>] __wait_on_buffer+0x2e/0x30
[ 283.440038] [<ffffffff8129d2ef>] jbd2_journal_commit_transaction+0xb0f/0x15d0
[ 283.440038] [<ffffffff810998da>] ? try_to_del_timer_sync+0x8a/0x110
[ 283.440038] [<ffffffff812a15db>] kjournald2+0xbb/0x220
[ 283.440038] [<ffffffff810ac320>] ? add_wait_queue+0x60/0x60
[ 283.440038] [<ffffffff812a1520>] ? commit_timeout+0x10/0x10
[ 283.440038] [<ffffffff810ab87c>] kthread+0x8c/0xa0
[ 283.440038] [<ffffffff81ab7174>] kernel_thread_helper+0x4/0x10
[ 283.440038] [<ffffffff810ab7f0>] ? flush_kthread_worker+0xa0/0xa0
[ 283.440038] [<ffffffff81ab7170>] ? gs_change+0x13/0x13
[ 283.440038] xfs_io D ffff880044fcca88 0 2817 2816 0x00000000
[ 283.440038] ffff880012b77ac8 0000000000000086 0000000000000000 ffffffffb135a68d
[ 283.440038] ffff880044fcc700 ffff880012b77fd8 ffff880012b77fd8 ffff880012b77fd8
[ 283.440038] ffff88007c8602c0 ffff880044fcc700 ffff880012b77ac8 00000001810b59ed
[ 283.440038] Call Trace:
[ 283.440038] [<ffffffff8118eec0>] ? __wait_on_buffer+0x30/0x30
[ 283.440038] [<ffffffff81aab3af>] schedule+0x3f/0x60
[ 283.440038] [<ffffffff81aab45f>] io_schedule+0x8f/0xd0
[ 283.440038] [<ffffffff8118eece>] sleep_on_buffer+0xe/0x20
[ 283.440038] [<ffffffff81aabc2f>] __wait_on_bit+0x5f/0x90
[ 283.440038] [<ffffffff8118eec0>] ? __wait_on_buffer+0x30/0x30
[ 283.440038] [<ffffffff81aabcdc>] out_of_line_wait_on_bit+0x7c/0x90
[ 283.440038] [<ffffffff810ac360>] ? autoremove_wake_function+0x40/0x40
[ 283.440038] [<ffffffff8118eebe>] __wait_on_buffer+0x2e/0x30
[ 283.440038] [<ffffffff8129f531>] jbd2_log_do_checkpoint+0x4c1/0x4e0
[ 283.440038] [<ffffffff8129f5ed>] __jbd2_log_wait_for_space+0x9d/0x1b0
[ 283.440038] [<ffffffff8129a4e0>] start_this_handle.isra.9+0x390/0x490
[ 283.440038] [<ffffffff810ac320>] ? add_wait_queue+0x60/0x60
[ 283.440038] [<ffffffff8129a6aa>] jbd2__journal_start+0xca/0x110
[ 283.440038] [<ffffffff8129a703>] jbd2_journal_start+0x13/0x20
[ 283.440038] [<ffffffff8127189f>] ext4_journal_start_sb+0x7f/0x1d0
[ 283.440038] [<ffffffff8127ddd4>] ? ext4_fallocate+0x1a4/0x530
[ 283.440038] [<ffffffff8127ddd4>] ext4_fallocate+0x1a4/0x530
[ 283.440038] [<ffffffff8115e992>] do_fallocate+0xf2/0x160
[ 283.440038] [<ffffffff8115ea4b>] sys_fallocate+0x4b/0x70
[ 283.440038] [<ffffffff81ab5082>] system_call_fastpath+0x16/0x1b
[ 283.440038] Sched Debug Version: v0.10, 3.2.0-rc2-dgc+ #89

Looks like the prealloc created lots of dirty pages:

$ cat /proc/meminfo
MemTotal: 2050356 kB
MemFree: 17676 kB
Buffers: 518888 kB
Cached: 1367132 kB
SwapCached: 1448 kB
Active: 423444 kB
Inactive: 1476300 kB
Active(anon): 8948 kB
Inactive(anon): 4796 kB
Active(file): 414496 kB
Inactive(file): 1471504 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 497976 kB
SwapFree: 491956 kB
Dirty: 392664 kB
Writeback: 0 kB
AnonPages: 12416 kB
Mapped: 6896 kB
Shmem: 24 kB
Slab: 105844 kB
SReclaimable: 92252 kB
SUnreclaim: 13592 kB
KernelStack: 848 kB
PageTables: 2508 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 1523152 kB
Committed_AS: 61012 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 15260 kB
VmallocChunk: 34359723095 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 6132 kB
DirectMap2M: 2091008 kB
$
$
$ cat /proc/vmstat
nr_free_pages 4378
nr_inactive_anon 1199
nr_active_anon 2232
nr_inactive_file 367877
nr_active_file 103629
nr_unevictable 0
nr_mlock 0
nr_anon_pages 3125
nr_mapped 1724
nr_file_pages 471873
nr_dirty 98166
nr_writeback 0
nr_slab_reclaimable 23064
nr_slab_unreclaimable 3416
nr_page_table_pages 628
nr_kernel_stack 106
nr_unstable 0
nr_bounce 0
nr_vmscan_write 1597
nr_vmscan_immediate_reclaim 1752
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 6
nr_dirtied 548796
nr_written 406231
numa_hit 1086286
numa_miss 0
numa_foreign 0
numa_interleave 20982
numa_local 1086286
numa_other 0
nr_anon_transparent_hugepages 0
nr_dirty_threshold 95863
nr_dirty_background_threshold 47931
pgpgin 181756
pgpgout 2417611
pswpin 437
pswpout 1597
pgalloc_dma 7445
pgalloc_dma32 1083448
pgalloc_normal 0
pgalloc_movable 0
pgfree 1095515
pgactivate 104582
pgdeactivate 3069
pgfault 677067
pgmajfault 460
pgrefill_dma 771
pgrefill_dma32 2298
pgrefill_normal 0
pgrefill_movable 0
pgsteal_dma 5415
pgsteal_dma32 90506
pgsteal_normal 0
pgsteal_movable 0
pgscan_kswapd_dma 8719
pgscan_kswapd_dma32 89764
pgscan_kswapd_normal 0
pgscan_kswapd_movable 0
pgscan_direct_dma 0
pgscan_direct_dma32 2969
pgscan_direct_normal 0
pgscan_direct_movable 0
zone_reclaim_failed 0
pginodesteal 0
slabs_scanned 1024
kswapd_steal 92952
kswapd_inodesteal 0
kswapd_low_wmark_hit_quickly 187
kswapd_high_wmark_hit_quickly 0
kswapd_skip_congestion_wait 0
pageoutrun 1083
allocstall 38
pgrotated 1623
htlb_buddy_alloc_success 0
htlb_buddy_alloc_fail 0
unevictable_pgs_culled 0
unevictable_pgs_scanned 0
unevictable_pgs_rescued 0
unevictable_pgs_mlocked 0
unevictable_pgs_munlocked 0
unevictable_pgs_cleared 0
unevictable_pgs_stranded 0
unevictable_pgs_mlockfreed 0

And the loop device writeback is stalled in
balance_dirty_pages_nr(). Any further writes to the system result in
those processes also hanging in balance_dirty_pages_nr().

I thought this might be a one-off, but I rebooted the machine and
from a clean boot, ran the above commands and it entered the same
state where I pulled out the above traces via sysrq-w. I'm not sure
what is going on yet.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/