Re: dirty balancing deadlock
From: Chris Mason
Date: Sun Feb 18 2007 - 20:02:35 EST
On Mon, Feb 19, 2007 at 01:54:31AM +0100, Miklos Szeredi wrote:
> > > > > > If so, writes to B will decrease the dirty memory threshold.
> > > > >
> > > > > Yes, but not by enough. Say A dirties a 1100 pages, limit is 1000.
> > > > > Some pages queued for writeback (doesn't matter how much). B writes
> > > > > back 1, 1099 dirty remain in A, zero in B. balance_dirty_pages() for
> > > > > B doesn't know that there's nothing more to write back for B, it's
> > > > > just waiting there for those 1099, which'll never get written.
> > > >
> > > > hm, OK, arguable. I guess something like this..
> > >
> > > Doesn't help the fuse case, but does seem to help the loopback mount
> > > one.
> > >
> > > For fuse it's worse with the patch: now the write triggered by the
> > > balance recurses into fuse, with disastrous results, since the fuse
> > > writeback is now blocked on the userspace queue.
> > >
> > > fusexmp_fh_no D 40136678 0 505 494 506 504 (NOTLB)
> > > 08982b78 00000001 00000000 08f9f9b4 0805d8cb 089a75f8 08982b78 08f98000
> > > 08f98000 08f9f9dc 0805a38a 089a7100 08982680 08f9f9cc 08f98000 08f98000
> > > 085d8300 08982680 089a7100 08f9fa34 08183006 089a7100 08982680 089a7100 Call Trace:
> > > 08f9f9a0: [<0805d8cb>] switch_to_skas+0x3b/0x83
> > > 08f9f9b8: [<0805a38a>] _switch_to+0x49/0x99
> > > 08f9f9e0: [<08183006>] schedule+0x246/0x547
> > > 08f9fa38: [<08103c7e>] fuse_get_req_wp+0xe9/0x14a
> > > 08f9fa70: [<08103d2e>] fuse_writepage+0x4f/0x12c
> > In general, writepage is supposed to do work without blocking on
> > expensive locks that will get pdflush and dirty reclaim stuck in this
> > fashion. You'll probably have to take the same approach reiserfs does
> > in data=journal mode, which is leaving the page dirty if fuse_get_req_wp
> > is going to block without making progress.
> Pdflush, and dirty reclaim set wbc->nonblocking to true.
> balance_dirty_pages and fsync don't. The problem here is that
> Andrew's patch is wrong to let balance_dirty_pages() try to write back
> pages from a different queue.
async or sync, writepage is supposed to either make progress or bail.
loopback aside, if the fuse call is blocking long term, you're going to
run into problems.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/