Re: v5.10.1 xfs deadlock

From: Brian Foster
Date: Thu Dec 17 2020 - 14:45:08 EST


On Thu, Dec 17, 2020 at 06:44:51PM +0100, Donald Buczek wrote:
> Dear xfs developer,
>
> I was doing some testing on a Linux 5.10.1 system with two 100 TB xfs filesystems on md raid6 raids.
>
> The stress test was essentially `cp -a`ing a Linux source repository with two threads in parallel on each filesystem.
>
> After about on hour, the processes to one filesystem (md1) blocked, 30 minutes later the process to the other filesystem (md0) did.
>
> root 7322 2167 0 Dec16 pts/1 00:00:06 cp -a /jbod/M8068/scratch/linux /jbod/M8068/scratch/1/linux.018.TMP
> root 7329 2169 0 Dec16 pts/1 00:00:05 cp -a /jbod/M8068/scratch/linux /jbod/M8068/scratch/2/linux.019.TMP
> root 13856 2170 0 Dec16 pts/1 00:00:08 cp -a /jbod/M8067/scratch/linux /jbod/M8067/scratch/2/linux.028.TMP
> root 13899 2168 0 Dec16 pts/1 00:00:05 cp -a /jbod/M8067/scratch/linux /jbod/M8067/scratch/1/linux.027.TMP
>
> Some info from the system (all stack traces, slabinfo) is available here: https://owww.molgen.mpg.de/~buczek/2020-12-16.info.txt
>
> It stands out, that there are many (549 for md0, but only 10 for md1) "xfs-conv" threads all with stacks like this
>
> [<0>] xfs_log_commit_cil+0x6cc/0x7c0
> [<0>] __xfs_trans_commit+0xab/0x320
> [<0>] xfs_iomap_write_unwritten+0xcb/0x2e0
> [<0>] xfs_end_ioend+0xc6/0x110
> [<0>] xfs_end_io+0xad/0xe0
> [<0>] process_one_work+0x1dd/0x3e0
> [<0>] worker_thread+0x2d/0x3b0
> [<0>] kthread+0x118/0x130
> [<0>] ret_from_fork+0x22/0x30
>
> xfs_log_commit_cil+0x6cc is
>
> xfs_log_commit_cil()
> xlog_cil_push_background(log)
> xlog_wait(&cil->xc_push_wait, &cil->xc_push_lock);
>
> Some other threads, including the four "cp" commands are also blocking at xfs_log_commit_cil+0x6cc
>
> There are also single "flush" process for each md device with this stack signature:
>
> [<0>] xfs_map_blocks+0xbf/0x400
> [<0>] iomap_do_writepage+0x15e/0x880
> [<0>] write_cache_pages+0x175/0x3f0
> [<0>] iomap_writepages+0x1c/0x40
> [<0>] xfs_vm_writepages+0x59/0x80
> [<0>] do_writepages+0x4b/0xe0
> [<0>] __writeback_single_inode+0x42/0x300
> [<0>] writeback_sb_inodes+0x198/0x3f0
> [<0>] __writeback_inodes_wb+0x5e/0xc0
> [<0>] wb_writeback+0x246/0x2d0
> [<0>] wb_workfn+0x26e/0x490
> [<0>] process_one_work+0x1dd/0x3e0
> [<0>] worker_thread+0x2d/0x3b0
> [<0>] kthread+0x118/0x130
> [<0>] ret_from_fork+0x22/0x30
>
> xfs_map_blocks+0xbf is the
>
> xfs_ilock(ip, XFS_ILOCK_SHARED);
>
> in xfs_map_blocks().
>
> The system is low on free memory
>
> MemTotal: 197587764 kB
> MemFree: 2196496 kB
> MemAvailable: 189895408 kB
>
> but responsive.
>
> I have an out of tree driver for the HBA ( smartpqi 2.1.6-005 pulled from linux-scsi) , but it is unlikely that this blocking is related to that, because the md block devices itself are responsive (`xxd /dev/md0` )
>
> I can keep the system in the state for a while. Is there an idea what was going from or an idea what data I could collect from the running system to help? I have full debug info and could walk lists or retrieve data structures with gdb.
>

It might be useful to dump the values under /sys/fs/xfs/<dev>/log/* for
each fs to get an idea of the state of the logs as well...

Brian

> Best
> Donald
>