Re: ext3 reservation question.

From: Andrew Morton
Date: Wed Apr 21 2004 - 22:42:54 EST


Linus Torvalds <torvalds@xxxxxxxx> wrote:
>
>
>
> On Wed, 21 Apr 2004, Badari Pulavarty wrote:
> >
> > I am worried about a case, where multiple threads writing to
> > different parts of same file - there by each thread thrashing
> > reservation window (since each one has its own goal).
>
> Didn't we have a patch two years ago or something floating around with
> doing lazy (delayed) block allocation on ext2 - doing the actual
> allocation only when writing the thing out? Then you shouldn't have this
> problem under any normal load, hopefully.

That would certainly help. I had delayed allocation for ext2 all up and
running in 2.5.7 or thereabouts - most of the complexity is in managing
filesystem space reservations. If you don't care about ENOSPC the VFS at
present "just works".

I do recall deciding that there were fundamental journal-related reasons
why delalloc couldn't be made to work properly on ext3.

ummm.

The code I had at the time would reserve space in the filesystem
correspnding to the worst-case occupancy based on file offset. When we
actually hit ENOSPC in prepare_write(), we force writeout, which results in
those worst-space reservations being collapsed into their _real_ space
usage, which is much less. So writeout reclaims space in the filesystem
and prepare_write() can proceed.

That worked fine on ext2. But on ext3 we have a transaction open in
prepare_write(), and the forced writeback will cause arbitrary amounts of
unexpected metadata to be pumped into the current transaction, causing the
fs to explode.

At least, I _think_ that was the problem. All is hazy.



Alex Tomas has current patches which do delalloc, but I don't know if they
do all the reservation stuff yet.

We would still face layout problems on SMP - two or more CPUs allocating
blocks in parallel. Could be solved by serialising writeback in some
manner - the fs-writeback.c code does that to some extent already.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/