Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3

From: Linus Torvalds
Date: Fri Jun 09 2006 - 14:21:23 EST




On Fri, 9 Jun 2006, Andreas Dilger wrote:
> missing from one filesystem or another.
>
> > Just as an example: ext3 _sucks_ in many ways. It has huge inodes that
> > take up way too much space in memory. It has absolutely disgusting code to
> > handle directory reading and writing (buffer heads! In 2006!).
>
> My point exactly! The ext2 directory code was moved from buffer heads to
> page cache by Al after ext3 was forked and the code was never fixed in ext3.

The code was never fixed in ext3, because ext3 is a pig in that area.

You misunderstand how this worked.

The reason ext2 got fixed was that ext2 was _simple_. It got fixed
_despite_ the fact that it's not all that widely used any more, and not
considered a really important filesystem. It got fixed because it wasn't
too bad. It doesn't have all the crud that makes it a much more involved
thing to do for ext3.

So if the ext2/3 split hadn't happened, _neither_ of them would be fixed.

See?

My point is, maintaining two different pieces is SIMPLER.

Even if that simplicity sometimes ends up meaning "not maintaining the
other one".

So being out of sync is not a problem. It's a _feature_.

> On Jun 09, 2006 09:54 -0700, Linus Torvalds wrote:
> > Btw, I'm not kidding you on this one.
> >
> > THE NUMBER ONE MEMORY USAGE ON A LOT OF LOADS IS EXT3 INODES IN MEMORY!
>
> Do you think that would be any different with a new filesystem?

It would be bigger, if you made ext3 do 48-bit block numbers.

See? ext3 would become strictly _worse_ for the majority of users, who
wouldn't get any advantage. That's my point.

> So, the ext3 inode could grow another ~50 bytes without changing the
> slab allocation size ;-), and in fact other filesystem aren't noticably
> different.

Yes, I already pointed out that the biggest part of it was actually the
vfs_inode thing.

And btw, growing more than 50 bytes is exactly what it would do. Go look.

> This is then the biggest problem of all filesystems.

Yeah, under many loads it is. We do really badly with lots of metadata in
memory. Why do you think people have historically complained about things
like the updatedb flushing their disk cache?

If you look at disk access patterns, one of _the_ biggest problems is not
in readign individual files. It's in inode atime updates and the other
"stupid crap" stuff.

> On a 32-bit system the vfs_inode is more than half of the size of the ext3
> inode, it is worse on 64-bit systems.

..which I pointed out, and doesn't change my point one _whit_.

The fact that the block numbers aren't the _only_ problem doesn't suddenly
mean they are problem-free, does it?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/