Re: dirty balancing deadlock

From: Chris Mason
Date: Sun Feb 18 2007 - 19:48:29 EST


On Mon, Feb 19, 2007 at 01:25:21AM +0100, Miklos Szeredi wrote:
> > > > If so, writes to B will decrease the dirty memory threshold.
> > >
> > > Yes, but not by enough. Say A dirties a 1100 pages, limit is 1000.
> > > Some pages queued for writeback (doesn't matter how much). B writes
> > > back 1, 1099 dirty remain in A, zero in B. balance_dirty_pages() for
> > > B doesn't know that there's nothing more to write back for B, it's
> > > just waiting there for those 1099, which'll never get written.
> >
> > hm, OK, arguable. I guess something like this..
>
> Doesn't help the fuse case, but does seem to help the loopback mount
> one.
>
> For fuse it's worse with the patch: now the write triggered by the
> balance recurses into fuse, with disastrous results, since the fuse
> writeback is now blocked on the userspace queue.
>
> fusexmp_fh_no D 40136678 0 505 494 506 504 (NOTLB)
> 08982b78 00000001 00000000 08f9f9b4 0805d8cb 089a75f8 08982b78 08f98000
> 08f98000 08f9f9dc 0805a38a 089a7100 08982680 08f9f9cc 08f98000 08f98000
> 085d8300 08982680 089a7100 08f9fa34 08183006 089a7100 08982680 089a7100 Call Trace:
> 08f9f9a0: [<0805d8cb>] switch_to_skas+0x3b/0x83
> 08f9f9b8: [<0805a38a>] _switch_to+0x49/0x99
> 08f9f9e0: [<08183006>] schedule+0x246/0x547
> 08f9fa38: [<08103c7e>] fuse_get_req_wp+0xe9/0x14a
> 08f9fa70: [<08103d2e>] fuse_writepage+0x4f/0x12c

In general, writepage is supposed to do work without blocking on
expensive locks that will get pdflush and dirty reclaim stuck in this
fashion. You'll probably have to take the same approach reiserfs does
in data=journal mode, which is leaving the page dirty if fuse_get_req_wp
is going to block without making progress.

Queue it somewhere else (ie an internal Fs cleaning thread) and leave
the page dirty so that we can move on to other pages that have a chance
of being cleaned.

-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/