ext3 performance bottleneck as the number of spindles gets large

From: mgross (mgross@unix-os.sc.intel.com)
Date: Wed Jun 19 2002 - 16:29:45 EST


We've been doing some throughput comparisons and benchmarks of block I/O
throughput for 8KB writes as the number of SCSI addapters and drives per
adapter is increased.

The Linux platform is a dual processor 1.2GHz PIII, 2Gig or RAM, 2U box.
Similar results have been seen with both 2.4.16 and 2.4.18 base kernel, as
well as one of those patched up O(1) 2.4.18 kernels out there.

The benchmark is Bonnie++.

What seems to be happening is the throughput for 8Kb sequential Write's with
300MB files goes down with the number of spindles. We have negative scale WRT
spindles per SCSI adapter, and very poor scaling per SCSI adapter.

(The other 2 processor + OS platform sees its throughput go up with adapters and
spindles. )

Running this benchmark with lockmeter ends up pointing a big finger at BKL
contention in: ext3_commit_write, ext3_dirty_inode, ext3_get_block_handle
and, ext3_prepare_write (twice!). Attached is the output from the worst
case, 4 SCSI adapters with 6 drives per adapter.

Has anyone done any work looking into the I/O scaling of Linux / ext3 per
spindle or per adapter? We would like to compare notes.

I've only just started to look at the ext3 code but it seems to me that replacing the
BKL with a per - ext3 file system lock could remove some of the contention thats
getting measured. What data are the BKL protecting in these ext3 functions? Could a
lock per FS approach work?

Thoughts?
Comments?
Ideas?

--mgross

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
SPINLOCKS HOLD WAIT
  UTIL CON MEAN( MAX ) MEAN( MAX )(% CPU) TOTAL NOWAIT SPIN RJECT NAME

        3.7% 0.7us( 44ms) 7.8us( 44ms)(22.9%) 49644038 96.3% 3.7% 0.00% *TOTAL*

 26.6% 71.2% 13us( 44ms) 8.0us(8076us)( 5.8%) 632107 28.8% 71.2% 0% ext3_commit_write+0x38
  4.4% 30.3% 4.3us( 360us) 13us(7511us)( 2.1%) 316124 69.7% 30.3% 0% ext3_dirty_inode+0x2c
 28.1% 7.9% 14us(1660us) 9.7us(6842us)(0.78%) 632239 92.1% 7.9% 0% ext3_get_block_handle+0x8c
  1.2% 27.2% 0.6us( 240us) 11us(6604us)( 3.0%) 632107 72.8% 27.2% 0% ext3_prepare_write+0x34
 0.26% 88.1% 0.1us( 74us) 9.6us(7026us)( 8.6%) 632107 11.9% 88.1% 0% ext3_prepare_write+0xe0



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Jun 23 2002 - 22:00:20 EST