Traversing XFS mounted VM puts process in D state

From: Dinesh Pathak
Date: Thu Dec 07 2017 - 20:17:14 EST


Hi, We are mounting and traversing one backup of a VM with XFS
filesystem. Sometimes during traversing, the process goes into D state
and can not be killed. Eventually system needs to IPMI rebooted. This
happens once in 100 times.

This VM backup is kept on NFS storage. So we first do NFS mounting.
Then do loopback mount of the partition which contain XFS. After that
we traverse the file system, but this traversing is not necessarily
multi threaded (We have seen the issue in both single-threaded and
multi-threaded traversal)

I see a similar problem reported here:
https://access.redhat.com/solutions/2456711
The resolution given here is to upgrade the linux kernel to
kernel-3.10.0-514.el7 RHSA-2016-2574 RHEL7.3. Upgrading the kernel may
not be possible for us. Is there any patch/patches that we can apply
to fix this issue.

One more thread here says that this issue is fixed only in the above
kernel version. It is seen in previous as well as later versions.
https://bugs.centos.org/view.php?id=13843&history=1

Is there anyway to reproduce this problem. All our efforts to
reproduce this issue have not succeeded.

Please help me know if any more debugging can be done.

Thanks,
Dinesh

Kernel version of source VM, whose backup is taken.

root@web-2318 ~]# uname -a

Linux web-2318.website.oxilion.nl 3.10.0-514.26.2.el7.x86_64 #1 SMP
Tue Jul 4 15:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux



Kernel version of the machine where backup is mounted and traversed.
3.10.0-327.22.2.el7.x86_64 #1 SMP Tue Jul 5 12:41:09 PDT 2016 x86_64
x86_64 x86_64 GNU/Linux


Mon Dec 4 21:08:21 2017] yoda_exec D 0000000000000000 0
48948 48938 0x00000000
[Mon Dec 4 21:08:21 2017] ffff8801052437b0 0000000000000086
ffff88000aa02e00 ffff880105243fd8
[Mon Dec 4 21:08:21 2017] ffff880105243fd8 ffff880105243fd8
ffff88000aa02e00 ffff88010521e730
[Mon Dec 4 21:08:21 2017] 7fffffffffffffff ffff88000aa02e00
0000000000000002 0000000000000000

[Mon Dec 4 21:08:21 2017] Call Trace:

[Mon Dec 4 21:08:21 2017] [<ffffffff8163b7f9>] schedule+0x29/0x70

[Mon Dec 4 21:08:21 2017] [<ffffffff816394e9>] schedule_timeout+0x209/0x2d0

[Mon Dec 4 21:08:21 2017] [<ffffffffa07a2e67>] ?
xfs_iext_bno_to_ext+0xa7/0x1a0 [xfs]

[Mon Dec 4 21:08:21 2017] [<ffffffff8163ab22>] __down_common+0xd2/0x14a

[Mon Dec 4 21:08:21 2017] [<ffffffffa07b00cd>] ?
_xfs_buf_find+0x16d/0x2c0 [xfs]

[Mon Dec 4 21:08:21 2017] [<ffffffff8163abb7>] __down+0x1d/0x1f

[Mon Dec 4 21:08:21 2017] [<ffffffff810ab921>] down+0x41/0x50

[Mon Dec 4 21:08:21 2017] [<ffffffffa07afecc>] xfs_buf_lock+0x3c/0xd0 [xfs]

[Mon Dec 4 21:08:21 2017] [<ffffffffa07b00cd>] _xfs_buf_find+0x16d/0x2c0 [xfs]

[Mon Dec 4 21:08:21 2017] [<ffffffffa07b024a>]
xfs_buf_get_map+0x2a/0x180 [xfs]

[Mon Dec 4 21:08:21 2017] [<ffffffffa07b0d2c>]
xfs_buf_read_map+0x2c/0x140 [xfs]

[Mon Dec 4 21:08:21 2017] [<ffffffffa07dd829>]
xfs_trans_read_buf_map+0x199/0x400 [xfs]

[Mon Dec 4 21:08:21 2017] [<ffffffffa0790204>]
xfs_da_read_buf+0xd4/0x100 [xfs]

[Mon Dec 4 21:08:21 2017] [<ffffffffa0790253>]
xfs_da3_node_read+0x23/0xd0 [xfs]

[Mon Dec 4 21:08:21 2017] [<ffffffff811c153a>] ? kmem_cache_alloc+0x1ba/0x1d0

[Mon Dec 4 21:08:21 2017] [<ffffffffa07914ce>]
xfs_da3_node_lookup_int+0x6e/0x2f0 [xfs]

[Mon Dec 4 21:08:21 2017] [<ffffffffa079bded>]
xfs_dir2_node_lookup+0x4d/0x170 [xfs]

[Mon Dec 4 21:08:21 2017] [<ffffffffa07937b5>]
xfs_dir_lookup+0x195/0x1b0 [xfs]

[Mon Dec 4 21:08:21 2017] [<ffffffffa07c1bb6>] xfs_lookup+0x66/0x110 [xfs]

[Mon Dec 4 21:08:21 2017] [<ffffffffa07bea0b>] xfs_vn_lookup+0x7b/0xd0 [xfs]

[Mon Dec 4 21:08:21 2017] [<ffffffff811e8cad>] lookup_real+0x1d/0x50

[Mon Dec 4 21:08:21 2017] [<ffffffff811e9622>] __lookup_hash+0x42/0x60

[Mon Dec 4 21:08:21 2017] [<ffffffff8163342b>] lookup_slow+0x42/0xa7

[Mon Dec 4 21:08:21 2017] [<ffffffff811ee4f3>] path_lookupat+0x773/0x7a0

[Mon Dec 4 21:08:21 2017] [<ffffffff81186f6a>] ? kvfree+0x2a/0x40

[Mon Dec 4 21:08:21 2017] [<ffffffff811c13b5>] ? kmem_cache_alloc+0x35/0x1d0

[Mon Dec 4 21:08:21 2017] [<ffffffff811ef1ef>] ? getname_flags+0x4f/0x1a0

[Mon Dec 4 21:08:21 2017] [<ffffffff811ee54b>] filename_lookup+0x2b/0xc0

[Mon Dec 4 21:08:21 2017] [<ffffffff811f0317>] user_path_at_empty+0x67/0xc0

[Mon Dec 4 21:08:21 2017] [<ffffffff811f0381>] user_path_at+0x11/0x20

[Mon Dec 4 21:08:21 2017] [<ffffffff811e3bc3>] vfs_fstatat+0x63/0xc0

[Mon Dec 4 21:08:21 2017] [<ffffffff811e4191>] SYSC_newlstat+0x31/0x60

[Mon Dec 4 21:08:21 2017] [<ffffffff811f27fc>] ? vfs_readdir+0x8c/0xe0

[Mon Dec 4 21:08:21 2017] [<ffffffff811f2cad>] ? SyS_getdents+0xfd/0x120

[Mon Dec 4 21:08:21 2017] [<ffffffff811e441e>] SyS_newlstat+0xe/0x10

[Mon Dec 4 21:08:21 2017] [<ffffffff81646889>] system_call_fastpath+0x16/0x1b