[4.2, Regression] Queued spinlocks cause major XFS performance regression

From: Dave Chinner
Date: Fri Sep 04 2015 - 01:48:37 EST

Next message: maitysanchayan: "Re: [PATCH v9 0/4] Implement OCOTP driver for Vybrid using NVMEM"
Previous message: Tomeu Vizoso: "Re: [PATCH v3 01/18] platform: delay OF device-driver matches until late_initcall"
Next in thread: Linus Torvalds: "Re: [4.2, Regression] Queued spinlocks cause major XFS performance regression"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Waiman,

For the first time in months I just turned of spinlock debugging on
my performance test machine and I just got an unpleasant surprise on
my standard inode allocation and reclaim test. I've described this
test to you before, because it's found regressions in your previous
lock scaling changes:

http://permalink.gmane.org/gmane.linux.kernel/1768786

This time it is the fsmark run that I use to populate the filesystem
that is demonstrating a locking regression. I'll asked you before
if you could add this test to your lock scaling regression test
suite; please do it this time.

Now, the regression. With spinlock debugging turned on, the
performance of my usual XFS inode allocation benchmark using fsmark
reports performance like this:

FSUse% Count Size Files/sec App Overhead
0 1600000 0 312594.0 9944159
0 3200000 0 295668.6 10399679
0 4800000 0 279026.1 11397617
.....

This has been pretty stable for several releases - it varies +/- a
few percent, but it's pretty much been like this since about 3.2
when CONFIG_XFS_DEBUG=n, with or without basic spinlock debugging.

When I turned spinlock debugging off on 4.2 to get some perf numbers
a request from Linus, I got this:

FSUse% Count Size Files/sec App Overhead
0 1600000 0 114143.9 9597599
0 3200000 0 95486.9 9460413
0 4800000 0 93918.2 9784699
....

All 16 CPUs were pegged at 100% cpu usage. I took a quick look at
perf:

67.32% [kernel] [k] queued_spin_lock_slowpath
5.17% [kernel] [k] xfs_log_commit_cil
2.47% [kernel] [k] _xfs_buf_find
1.37% [kernel] [k] _raw_spin_lock
....

And then a quick call graph sample to find the lock:

37.19% 37.19% [kernel] [k] queued_spin_lock_slowpath
- queued_spin_lock_slowpath
- 99.98% _raw_spin_lock
- 89.16% xfs_log_commit_cil
- __xfs_trans_commit
- 98.48% xfs_trans_commit
xfs_create
xfs_generic_create
xfs_vn_mknod
xfs_vn_create
vfs_create
path_openat
do_filp_open
do_sys_open
sys_open
entry_SYSCALL_64_fastpath
+ __GI___libc_open
+ 1.52% __xfs_trans_roll

This shows that we have catastrophic spinlock contention in the
transaction commit path. The cil->xc_cil_lock spin lock as it's the
only spinlock in that path. And while it's the hot lock in the
commit path, turning spinlock debugging back on (and no other
changes) shows that it shouldn't be contended:

8.92% [kernel] [k] _xfs_buf_find
5.51% [kernel] [k] xfs_dir2_node_addname
3.49% [kernel] [k] xfs_dir3_free_hdr_from_disk
3.45% [kernel] [k] do_raw_spin_lock
3.06% [kernel] [k] kmem_cache_alloc
2.99% [kernel] [k] __memcpy
2.97% [kernel] [k] xfs_log_commit_cil
.....

And the call graph:

4.52% 0.24% [kernel] [k] _raw_spin_lock
- _raw_spin_lock
+ 43.63% xfs_log_commit_cil
+ 9.76% inode_sb_list_add
+ 8.57% list_lru_add
+ 6.37% _xfs_buf_find
+ 6.08% d_alloc
+ 4.18% dput
+ 4.08% __d_instantiate
...

IOWs, about 2% of the cpu usage is lock traffic through
cil->xc_cil_lock, and that's 3x the locking rate where the queued
spinlock becomes catastrophically contention bound.

To confirm that this is indeed caused by the queued spinlocks, I
removed the the spinlock debugging and did this to arch/x86/Kconfig:

- select ARCH_USE_QUEUED_SPINLOCK

And the results are:

FSUse% Count Size Files/sec App Overhead
0 1600000 0 329310.4 9727415
0 3200000 0 305421.4 10421358
0 4800000 0 294593.3 11007112
0 6400000 0 283863.7 12557190
....

So the problem is definitely the queued spinlock...

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: maitysanchayan: "Re: [PATCH v9 0/4] Implement OCOTP driver for Vybrid using NVMEM"
Previous message: Tomeu Vizoso: "Re: [PATCH v3 01/18] platform: delay OF device-driver matches until late_initcall"
Next in thread: Linus Torvalds: "Re: [4.2, Regression] Queued spinlocks cause major XFS performance regression"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]