Re: [patch] Converting writeback linked lists to a tree based data structure

From: Fengguang Wu
Date: Tue Jan 15 2008 - 23:26:20 EST


On Tue, Jan 15, 2008 at 07:44:15PM -0800, Andrew Morton wrote:
> On Wed, 16 Jan 2008 11:01:08 +0800 Fengguang Wu <wfg@xxxxxxxxxxxxxxxx> wrote:
>
> > On Tue, Jan 15, 2008 at 09:53:42AM -0800, Michael Rubin wrote:
> > > On Jan 15, 2008 12:46 AM, Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> wrote:
> > > > Just a quick question, how does this interact/depend-uppon etc.. with
> > > > Fengguangs patches I still have in my mailbox? (Those from Dec 28th)
> > >
> > > They don't. They apply to a 2.6.24rc7 tree. This is a candidte for 2.6.25.
> > >
> > > This work was done before Fengguang's patches. I am trying to test
> > > Fengguang's for comparison but am having problems with getting mm1 to
> > > boot on my systems.
> >
> > Yeah, they are independent ones. The initial motivation is to fix the
> > bug "sluggish writeback on small+large files". Michael introduced
> > a new rbtree, and me introduced a new list(s_more_io_wait).
> >
> > Basically I think rbtree is an overkill to do time based ordering.
> > Sorry, Michael. But s_dirty would be enough for that. Plus, s_more_io
> > provides fair queuing between small/large files, and s_more_io_wait
> > provides waiting mechanism for blocked inodes.
> >
> > The time ordered rbtree may delay io for a blocked inode simply by
> > modifying its dirtied_when and reinsert it. But it would no longer be
> > that easy if it is to be ordered by location.
>
> What does the term "ordered by location" mean? Attemting to sort inodes by
> physical disk address? By using their i_ino as a key?
>
> That sounds optimistic.

Yes, exactly. Think about email servers with lots of dirty files.

> > If we are going to do location based ordering in the future, the lists
> > will continue to be useful. It would simply be a matter of switching
> > from the s_dirty(order by time) to some rbtree or radix tree(order by
> > location).
> >
> > We can even provide both ordering at the same time to different
> > fs/inodes which is configurable by the user. Because the s_dirty
> > and/or rbtree would provide _only_ ordering(not faireness or waiting)
> > and hence is interchangeable.
> >
> > This patchset could be a good reference. It does location based
> > ordering with radix tree:
> >
> > [RFC][PATCH] clustered writeback <http://lkml.org/lkml/2007/8/27/45>
>
> list_heads are just the wrong data structure for this function. Especially
> list_heads which are protected by a non-sleeping lock.

list_heads are OK if we use them for one and only function. We have
been trying to jam too much into s_dirty in the past. Grabbing a
refcount could be better than locking - anyway if we split the
functions today, it would be easy to replace the list_heads one by
one in the future.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/