Re: [RFC][PATCH 0/7] locking: qspinlock

From: Dave Chinner
Date: Tue Mar 11 2014 - 22:32:47 EST

Next message: WANG Chao: "Re: [PATCH v2] x86, calgary: use 8M TCE table size by default"
Previous message: Frederic Weisbecker: "Re: [PATCH 1/6] cputime: Fix nsecs_to_cputime() return type cast"
In reply to: Waiman Long: "Re: [RFC][PATCH 0/7] locking: qspinlock"
Next in thread: Steven Rostedt: "Re: [RFC][PATCH 0/7] locking: qspinlock"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Mar 10, 2014 at 04:42:36PM +0100, Peter Zijlstra wrote:
> Hi Waiman,
>
> I promised you this series a number of days ago; sorry for the delay I've been
> somewhat unwell :/
>
> That said, these few patches start with a (hopefully) simple and correct form
> of the queue spinlock, and then gradually build upon it, explaining each
> optimization as we go.
>
> Having these optimizations as separate patches helps twofold; firstly it makes
> one aware of which exact optimizations were done, and secondly it allows one to
> proove or disprove any one step; seeing how they should be mostly identity
> transforms.
>
> The resulting code is near to what you posted I think; however it has one
> atomic op less in the pending wait-acquire case for NR_CPUS != huge. It also
> doesn't do lock stealing; its still perfectly fair afaict.
>
> Have I missed any tricks from your code?
>
> The patches apply to tip/master + lkml.kernel.org/r/20140210195820.834693028@xxxxxxxxxxxxx

FWIW, I saw this patchset referenced on LWN, and noted that the lock
contention that AIM7 saw was in the VFS and XFS in particular. So,
I pulled out my trusty "smash the VFS locks" test on a 16p VM. The
multithreaded bulkstat test *hammers* the inode cache. I've got a
filesystem with 51.2 million inodes in it, and it pegs all 16 CPUs
with this profile:

$ sudo time src/bstat -t -q /mnt/scratch
0.01user 2300.09system 2:28.13elapsed 1552%CPU (0avgtext+0avgdata 56268maxresident)k
0inputs+0outputs (12major+133515minor)pagefaults 0swaps
$
.....
73.95% [kernel] [k] _raw_spin_lock
+ 48.65% evict
+ 48.21% inode_sb_list_add
+ 1.31% inode_wait_for_writeback
0.51% __remove_inode_hash
evict
iput
xfs_bulkstat_one_int
xfs_bulkstat_one
xfs_bulkstat
xfs_ioc_bulkstat
xfs_file_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
__GI___ioctl
+ 4.87% [kernel] [k] native_read_tsc
+ 2.72% [kernel] [k] do_raw_spin_unlock
+ 2.36% [kernel] [k] _xfs_buf_find
+ 1.35% [kernel] [k] delay_tsc
+ 1.29% [kernel] [k] __crc32c_le
+ 1.19% [kernel] [k] __slab_alloc
+ 0.93% [kernel] [k] xfs_setup_inode
+ 0.73% [kernel] [k] _raw_spin_unlock_irqrestore

It's almost entirely spinlock contention on the inode list lock,
just like the AIM7 disk test that was referenced here:

http://lwn.net/Articles/590268/

With the queuing spinlock, I expected to see somewhat better
results, but I didn't at first. Turns out if you have any sort of
lock debugging turned on, then the code doesn't ever go into the
lock slow path and hence does not ever enter the "lock failed" slow
path where all the contention fixes are supposed to be.

Anyway, with all lock debugging turned off, the system hangs
the instant I start the multithreaded bulkstat workload. Even the
console is unrepsonsive. I did get a partial task trace, indicating
everything was locked up on the contended lock:

[ 64.992006] bstat R running task 5128 4488 4397 0x0000000a
[ 64.992006] ffff88041b385a18 ffffffff81ce6c81 ffff88041b385a38 ffffffff811bee8d
[ 64.992006] ffff88041b385a58 ffff8800b8527000 ffff88041b385a58 ffffffff814a0854
[ 64.992006] 0000002c80004321 ffff8800b8527000 ffff88041b385b08 ffffffff8149998c
[ 64.992006] Call Trace:
[ 64.992006] [<ffffffff81ce6c81>] ? _raw_spin_lock+0x21/0x30
[ 64.992006] [<ffffffff811bee8d>] inode_sb_list_add+0x1d/0x60
[ 64.992006] [<ffffffff814a0854>] xfs_setup_inode+0x34/0x310
[ 64.992006] [<ffffffff8149998c>] xfs_iget+0x4ec/0x750
[ 64.992006] [<ffffffff814a0b30>] ? xfs_setup_inode+0x310/0x310
[ 64.992006] [<ffffffff814a0cb7>] xfs_bulkstat_one_int+0xa7/0x340
[ 64.992006] [<ffffffff814a0f70>] xfs_bulkstat_one+0x20/0x30
[ 64.992006] [<ffffffff814a143c>] xfs_bulkstat+0x4bc/0xa10
[ 64.992006] [<ffffffff814a0f50>] ? xfs_bulkstat_one_int+0x340/0x340
[ 64.992006] [<ffffffff8149bdb0>] xfs_ioc_bulkstat+0xd0/0x1b0
[ 64.992006] [<ffffffff8149d114>] xfs_file_ioctl+0x3e4/0xae0
[ 64.992006] [<ffffffff81cea6fc>] ? __do_page_fault+0x1fc/0x4f0
[ 64.992006] [<ffffffff811b79e3>] do_vfs_ioctl+0x83/0x500
[ 64.992006] [<ffffffff811b7ef1>] SyS_ioctl+0x91/0xb0
[ 64.992006] [<ffffffff81cef652>] system_call_fastpath+0x16/0x1b

And the rest of the threads were in a cond_resched() right after the
code where the lock was taken in teh evict path:

[ 64.992006] bstat R running task 4576 4493 4397 0x00000000
[ 64.992006] ffff880419c95a48 0000000000000082 ffff880419ec1820 0000000000012e00
[ 64.992006] ffff880419c95fd8 0000000000012e00 ffff880419509820 ffff880419ec1820
[ 64.992006] ffff880419c95a88 ffff8800b8a8a1a8 ffff8800b8a8a000 ffff8800b8a8a1a8
[ 64.992006] Call Trace:
[ 64.992006] [<ffffffff81ce3099>] _cond_resched+0x29/0x40
[ 64.992006] [<ffffffff811be526>] clear_inode+0x16/0x70
[ 64.992006] [<ffffffff814a59e9>] xfs_fs_evict_inode+0x59/0xf0
[ 64.992006] [<ffffffff811bf96f>] evict+0xaf/0x1b0
[ 64.992006] [<ffffffff811c00d3>] iput+0x103/0x190
[ 64.992006] [<ffffffff814a0b30>] ? xfs_setup_inode+0x310/0x310
[ 64.992006] [<ffffffff814a0e97>] xfs_bulkstat_one_int+0x287/0x340
[ 64.992006] [<ffffffff814a0f70>] xfs_bulkstat_one+0x20/0x30
[ 64.992006] [<ffffffff814a143c>] xfs_bulkstat+0x4bc/0xa10
[ 64.992006] [<ffffffff814a0f50>] ? xfs_bulkstat_one_int+0x340/0x340
[ 64.992006] [<ffffffff8149bdb0>] xfs_ioc_bulkstat+0xd0/0x1b0
[ 64.992006] [<ffffffff8149d114>] xfs_file_ioctl+0x3e4/0xae0
[ 64.992006] [<ffffffff81cea6fc>] ? __do_page_fault+0x1fc/0x4f0
[ 64.992006] [<ffffffff811b79e3>] do_vfs_ioctl+0x83/0x500
[ 64.992006] [<ffffffff811b7ef1>] SyS_ioctl+0x91/0xb0
[ 64.992006] [<ffffffff81cef652>] system_call_fastpath+0x16/0x1b

So either this patchset doesn't work right now, or there's something
else broken in the tip/master tree...

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: WANG Chao: "Re: [PATCH v2] x86, calgary: use 8M TCE table size by default"
Previous message: Frederic Weisbecker: "Re: [PATCH 1/6] cputime: Fix nsecs_to_cputime() return type cast"
In reply to: Waiman Long: "Re: [RFC][PATCH 0/7] locking: qspinlock"
Next in thread: Steven Rostedt: "Re: [RFC][PATCH 0/7] locking: qspinlock"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]