Re: [PATCH, RFC] ext4: Use preallocation when reading from the inodetable

From: Andreas Dilger
Date: Thu Sep 25 2008 - 19:40:39 EST

Next message: Milton Miller: "Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption"
Previous message: Linus Torvalds: "Re: [RFC PATCH 1/3] Unified trace buffer"
In reply to: Theodore Tso: "Re: [PATCH, RFC] ext4: Use preallocation when reading from theinode table"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sep 24, 2008 16:35 -0400, Theodore Ts'o wrote:
> On the other hand, if we take your iop/s and translate them to
> milliseconds so we can measure the latency in the case where the
> workload is essentialy doing random reads, and then cross correlated
> it with my measurements, we get this table:

Comparing the incremental benefit of each step:

> i/o size iops/s ms latency % degredation % improvement
> of random inodes of related inodes I/O
> 4k 131 7.634
> 8k 130 7.692 0.77% 11.3%
1.57% 10.5%
> 16k 128 7.813 2.34% 21.8%
1.63% 7.8%
> 32k 126 7.937 3.97% 29.6%
4.29% 5.9%
> 64k 121 8.264 8.26% 35.5%
7.67% 4.5%
> 128k 113 8.850 15.93% 40.0%
16.07% 2.4%
> 256k 100 10.000 31.00% 42.4%
>
> Depending on whether you believe that workloads involving random inode
> reads are more common compared to related inodes I/O, the sweet spot
> is probably somewhere between 32k and 128k. I'm open to opinions
> (preferably backed up with more benchmarks of likely workloads) of
> whether we should use a default value of inode_readahead_bits of 4 or
> 5 (i.e., 64k, my original guess, or 128k, in v2 of the patch). But
> yes, making it tunable is definitely going to be necessary, since for
> different workloads (i.e squid vs. git repositories) will have very
> different requirements.

It looks like moving from 64kB to 128kB readahead might be a loss for
"unknown" workloads, since that increases latency by 7.67% for the random
inode case, but we only get 4.5% improvement in the sequential inode case.
Also recall that at large scale "htree" breaks down to random inode
lookup so that isn't exactly a fringe case (though readahead may still
help if the cache is large enough).

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Milton Miller: "Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption"
Previous message: Linus Torvalds: "Re: [RFC PATCH 1/3] Unified trace buffer"
In reply to: Theodore Tso: "Re: [PATCH, RFC] ext4: Use preallocation when reading from theinode table"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]