Re: performance of filesystem xattrs with Samba4

From: Andreas Dilger
Date: Tue Nov 23 2004 - 17:43:47 EST


On Nov 23, 2004 00:02 +1100, tridge@xxxxxxxxx wrote:
> I've put up graphs of the first set of dbench3 results for various
> filesystems at:
>
> http://samba.org/~tridge/xattr_results/
>
> The results show that the ext3 large inode patch is extremely
> worthwhile. Using a 256 byte inode on ext3 gained a factor of up to 7x
> in performance, and only lost a very small amount when xattrs were not
> used. It took ext3 from a very mediocre performance to being the clear
> winner among current Linux journaled filesystems for performance when
> xattrs are used. Eventually I think that larger inodes should become
> the default.

For Lustre we tune the inode size at format time to allow the storing
of the "default" EA data within the larger inode. Is this the case with
samba and 256-byte inodes (i.e. is your EA data all going to fit within
the extra 124 bytes of space for storing EAs)? If you have to put any
of the commonly-used EA data into an external block the benefits are lost.

> The massive gap between ext2 and the other filesystems really shows
> clearly how much we are paying for journaling. I haven't tried any
> journal on external device or journal on nvram card tricks yet, but it
> looks like those will be worth pursuing.

One of the other things we do for Lustre right away is create the ext3
filesystem with larger journal sizes so that for the many-client cases
we do not get synchronous journal flushing if there are lots of active
threads. This can make a huge difference in overall performance at
high loads. Use "mke2fs -J size=400 ..." to create a 400MB journal
(assuming you have at least that much RAM and a large enough block
device, at least 4x the journal size just from a "don't waste space"
point of view).

One factor is that you don't necessarily need to write so much data at one
time, but also that ext3 needs to reserve journal space for the worst-case
usage, so you get 40-100 threads allocating "worst case" then "filling"
the journal (causing new operations to block) and finally completing with
only a small fraction of those reserved journal blocks actually used.

Having an external journal device also generally gives you a large
journal (by default it is the full size of the block device specified)
so sometimes the effects of the large journal are confused with the
fact that it is external. I haven't seen any perf numbers recently on
what kind of effect having an external journal has. I highly doubt that
NVRAM cards are any better than a dedicated disk for the journal, since
journal IO is write-only (except during recovery) and virtually seek-free.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/

Attachment: pgp00000.pgp
Description: PGP signature