Re: vm kills processes in our 2.3.12 port of reiserfs - what was the

Hans Reiser (reiser@idiom.com)
Wed, 01 Sep 1999 01:27:08 +0400


Reiserfs data is block aligned for all but the last 4050 bytes of a file (the
tail). Directories are also not block aligned. The tail is the whole file for
small files. I don't understand why you guys think that reiserfs is somehow
harder to integrate with the page cache (except for tails, which really isn't
that hard, especially compared with how hard it is to deal with ext2's 1k blocks
on a 4k page).

I think it is more than a little unfair to let the ext2 folks rewrite VFS and
ext2 in parallel, and then announce an impending code freeze right after
VFS has been radically changed. End of whine. With time, and a few confused
postings to linux-kernel, we will catch up to these changes,
and learn how to avoid ext2 style read-ahead, how to insert our multiple-file
read-ahead, how to add SGI style allocate blocks on flush, etc. Until then we
can have a 2.3 port that works, and that real users will want to use.

So, as I understand it, we should create a check_then_ mark_buffer_dirty(), put
the checking to see if there are too many dirty buffers in there, and stick it
everywhere we have mark_buffer_dirty() in 2.2.11. Everywhere we have our 2.2.11
custom coded schedule-free mark buffer dirty, we put mark_buffer_dirty(), and
pray it doesn't get changed yet again:-).

We have hired an SMP specialist, and he is going to dump all our schedule
tracking
crud that currently causes us to have our own version of getblk, etc., but it
will take him time to do that, as he is new.

Now we can fix the dbench bug in a manner consistent with the rest of the
kernel, I am happy.

Hans

"Theodore Y. Ts'o" wrote:

> Date: Tue, 31 Aug 1999 08:22:30 -0700 (PDT)
> From: Linus Torvalds <torvalds@transmeta.com>
>
> And as the normal filesystems are all using the common buffer routines
> anyway, it's only the special cases tat are hurt by having to explicitly
> do te dirty balancing. In fact, I suspect that raiserfs should =really=
> try to use the common routines too - it will never beat the standard
> filesystems in performance if it doesn't (the old-fashioned way of having
> data in both buffer-cache and the page cache not only uses memory, but is
> much slower for certain memory-mapped operations).
>
> My guess is that reiserfs can't use the standard buffer routines in many
> cases since the file data isn't necessarily block aligned. (I don't
> know if this is true for small files, or for all files, but there are
> definitely many cases where file data isn't located a specific block
> location, and so the standard page cache routines probably won't work
> for them.)

See above.

>
>
> The idea (as I understand it; Hans I am sure will correct me if I'm
> wrong) is that you win because you don't have to seek to another block
> to read the data, since the data is stored in-line in the B-tree. Note

Yes, and you also save on space, and get sundry other advantages.
You lose nothing. All remaining flaws in reiserfs performance
are due to peripheral code now being ripped out one section at a time.
Example: next month we get a new file write with extents, allocate blocks on
flush,
and higher performance hole creation. Even with the flaws that remain,
performance
is on the whole better than other approaches (see MySQL benchmarks and dbench
benchmarks).

>
> that I am *not* defending this design decision; just pointing out one of
> the consequences of it. Personally, I think it's radical enough that
> it's worth experimenting with, but I'm not convinced that it's going to

Experiment concluded: it works:-)

Try it.... add code....

>
> be a win. (I investigated doing something like this by storing data in
> the inode ala fast inodes, and decided it wasn't worth the complexity
> cost, but reiserfs should be able to store larger small files within the
> B-tree than I was.)
>
> - Ted

Best,

Hans

--
Get Linux (http://www.kernel.org) plus ReiserFS
 (http://devlinux.org/namesys).  If you sell an OS or
internet appliance, buy a port of ReiserFS!  If you
need customizations and industrial grade support, we sell them.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/