Re: Hang in XFS reclaim on 3.7.0-rc3

From: Torsten Kaiser
Date: Tue Nov 20 2012 - 02:08:56 EST


On Tue, Nov 20, 2012 at 12:53 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Mon, Nov 19, 2012 at 07:50:06AM +0100, Torsten Kaiser wrote:
> So, both lockdep thingy's are the same:

I suspected this, but as the reports where slightly different I
attached bith of them, as I couldn't decide witch one was the
better/simpler report to debug this.

>> [110926.972482] =========================================================
>> [110926.972484] [ INFO: possible irq lock inversion dependency detected ]
>> [110926.972486] 3.7.0-rc4 #1 Not tainted
>> [110926.972487] ---------------------------------------------------------
>> [110926.972489] kswapd0/725 just changed the state of lock:
>> [110926.972490] (sb_internal){.+.+.?}, at: [<ffffffff8122b268>] xfs_trans_alloc+0x28/0x50
>> [110926.972499] but this lock took another, RECLAIM_FS-unsafe lock in the past:
>> [110926.972500] (&(&ip->i_lock)->mr_lock/1){+.+.+.}
>
> Ah, what? Since when has the ilock been reclaim unsafe?
>
>> [110926.972500] and interrupts could create inverse lock ordering between them.
>> [110926.972500]
>> [110926.972503]
>> [110926.972503] other info that might help us debug this:
>> [110926.972504] Possible interrupt unsafe locking scenario:
>> [110926.972504]
>> [110926.972505] CPU0 CPU1
>> [110926.972506] ---- ----
>> [110926.972507] lock(&(&ip->i_lock)->mr_lock/1);
>> [110926.972509] local_irq_disable();
>> [110926.972509] lock(sb_internal);
>> [110926.972511] lock(&(&ip->i_lock)->mr_lock/1);
>> [110926.972512] <Interrupt>
>> [110926.972513] lock(sb_internal);
>
> Um, that's just bizzare. No XFS code runs with interrupts disabled,
> so I cannot see how this possible.
>
> .....
>
>
> [<ffffffff8108137e>] mark_held_locks+0x7e/0x130
> [<ffffffff81081a63>] lockdep_trace_alloc+0x63/0xc0
> [<ffffffff810e9dd5>] kmem_cache_alloc+0x35/0xe0
> [<ffffffff810dba31>] vm_map_ram+0x271/0x770
> [<ffffffff811e1316>] _xfs_buf_map_pages+0x46/0xe0
> [<ffffffff811e222a>] xfs_buf_get_map+0x8a/0x130
> [<ffffffff81233ab9>] xfs_trans_get_buf_map+0xa9/0xd0
> [<ffffffff8121bced>] xfs_ialloc_inode_init+0xcd/0x1d0
>
> We shouldn't be mapping buffers there, there's a patch below to fix
> this. It's probably the source of this report, even though I cannot
> lockdep seems to be off with the fairies...

I also tried to understand what lockdep was saying, but
Documentation/lockdep-design.txt is not too helpful.
I think 'CLASS'-ON-R / -ON-W means that this lock was 'ON' / held
while 'CLASS' (HARDIRQ, SOFTIRQ, RECLAIM_FS) happend and that makes
this lock unsafe for these contexts. IN-'CLASS'-R / -W seems to be
'lock taken in context 'CLASS'.
A note that 'CLASS'-ON-? means 'CLASS'-unsafe in there would be helpful to me...

Wrt. above interrupt output: I think lockdep doesn't really know about
RECLAIM_FS and threats it as another interrupt. I think that output
should have been something like this:
CPU0 CPU1
---- ----
lock(&(&ip->i_lock)->mr_lock/1);
<Allocation enters reclaim>
lock(sb_internal);
lock(&(&ip->i_lock)->mr_lock/1);
<Allocation enters reclaim>
lock(sb_internal);

Entering reclaim on CPU1 would mean that CPU1 would not enter reclaim
again, so the reclaim-'interrupt' would be disabled.
And instead of interrupts disrupting the normal codeflow on CPU0 it
would be 'interrupted' be instead of doing a normal allocation, it
would 'interrupt' the allocation to reclaim memory.
print_irq_lock_scenario() would need to be taught to print a slightly
different message for reclaim-'interrupts'.

I will try your patch, but as I do not have a reliable reproducer to
create this lockdep report, I can't really verify if this fixes it.
But I will definitely mail you, if it happens again with this patch.

Thanks, Torsten

> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
>
> xfs: inode allocation should use unmapped buffers.
>
> From: Dave Chinner <dchinner@xxxxxxxxxx>
>
> Inode buffers do not need to be mapped as inodes are read or written
> directly from/to the pages underlying the buffer. This fixes a
> regression introduced by commit 611c994 ("xfs: make XBF_MAPPED the
> default behaviour").
>
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> ---
> fs/xfs/xfs_ialloc.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c
> index 2d6495e..a815412 100644
> --- a/fs/xfs/xfs_ialloc.c
> +++ b/fs/xfs/xfs_ialloc.c
> @@ -200,7 +200,8 @@ xfs_ialloc_inode_init(
> */
> d = XFS_AGB_TO_DADDR(mp, agno, agbno + (j * blks_per_cluster));
> fbuf = xfs_trans_get_buf(tp, mp->m_ddev_targp, d,
> - mp->m_bsize * blks_per_cluster, 0);
> + mp->m_bsize * blks_per_cluster,
> + XBF_UNMAPPED);
> if (!fbuf)
> return ENOMEM;
> /*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/