Re: [PATCH 2/7] locking/rwsem: more aggressive use of optimistic spinning

From: Dave Chinner
Date: Thu Aug 14 2014 - 23:35:35 EST


On Wed, Aug 13, 2014 at 12:41:06PM -0400, Waiman Long wrote:
> On 08/13/2014 01:51 AM, Dave Chinner wrote:
> >On Mon, Aug 04, 2014 at 11:44:19AM -0400, Waiman Long wrote:
> >>On 08/04/2014 12:10 AM, Jason Low wrote:
> >>>On Sun, 2014-08-03 at 22:36 -0400, Waiman Long wrote:
> >>>>The rwsem_can_spin_on_owner() function currently allows optimistic
> >>>>spinning only if the owner field is defined and is running. That is
> >>>>too conservative as it will cause some tasks to miss the opportunity
> >>>>of doing spinning in case the owner hasn't been able to set the owner
> >>>>field in time or the lock has just become available.
> >>>>
> >>>>This patch enables more aggressive use of optimistic spinning by
> >>>>assuming that the lock is spinnable unless proved otherwise.
> >>>>
> >>>>Signed-off-by: Waiman Long<Waiman.Long@xxxxxx>
> >>>>---
> >>>> kernel/locking/rwsem-xadd.c | 2 +-
> >>>> 1 files changed, 1 insertions(+), 1 deletions(-)
> >>>>
> >>>>diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> >>>>index d058946..dce22b8 100644
> >>>>--- a/kernel/locking/rwsem-xadd.c
> >>>>+++ b/kernel/locking/rwsem-xadd.c
> >>>>@@ -285,7 +285,7 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
> >>>> static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
> >>>> {
> >>>> struct task_struct *owner;
> >>>>- bool on_cpu = false;
> >>>>+ bool on_cpu = true; /* Assume spinnable unless proved not to be */
> >>>Hi,
> >>>
> >>>So "on_cpu = true" was recently converted to "on_cpu = false" in order
> >>>to address issues such as a 5x performance regression in the xfs_repair
> >>>workload that was caused by the original rwsem optimistic spinning code.
> >>>
> >>>However, patch 4 in this patchset does address some of the problems with
> >>>spinning when there are readers. CC'ing Dave Chinner, who did the
> >>>testing with the xfs_repair workload.
> >>>
> >>This patch set enables proper reader spinning and so the problem
> >>that we see with xfs_repair workload should go away. I should have
> >>this patch after patch 4 to make it less confusing. BTW, patch 3 can
> >>significantly reduce spinlock contention in rwsem. So I believe the
> >>xfs_repair workload should run faster with this patch than both 3.15
> >>and 3.16.
> >I see lots of handwaving. I documented the test I ran when I
> >reported the problem so anyone with a 16p system and an SSD can
> >reproduce it. I don't have the bandwidth to keep track of the lunacy
> >of making locks scale these days - that's what you guys are doing.
> >
> >I gave you a simple, reliable workload that is extremely sensitive
> >to rwsem perturbations, so you should be adding it to your
> >regression tests rather than leaving it for others to notice you
> >screwed up....
> >
> >Cheers,
> >
> >Dave.
>
> If you can send me a rwsem workload that I can use for testing
> purpose, it will be highly appreciated.

<create sparse vm image file of 500TB on ssd with XFS on it>
xfs_io -f -c "truncate 500t" -c "extsize 1m" /path/to/vm/image/file

<start 16p/16GB RAM vm with image file configured as:
-drive file=/path/to/vm/image/file,if=virtio,cache=none >

In vm:

download and build fsmark from here:

git://oss.sgi.com/dgc/fs_mark

download and install xfsprogs v3.2.1 from here:

git://oss.sgi.com/xfs/cmds/xfsprogs.git tags/v3.2.1

Setup up the target filesystem:

# mkfs.xfs -f -m "crc=1,finobt=1" /dev/vda
# mount -o logbsize=262144,nobarrier /dev/vda /mnt/scratch


Run:

# fs_mark -D 10000 -S0 -n 50000 -s 0 -L 32 \
-d /mnt/scratch/0 -d /mnt/scratch/1 \
-d /mnt/scratch/2 -d /mnt/scratch/3 \
-d /mnt/scratch/4 -d /mnt/scratch/5 \
-d /mnt/scratch/6 -d /mnt/scratch/7 \
-d /mnt/scratch/8 -d /mnt/scratch/9 \
-d /mnt/scratch/10 -d /mnt/scratch/11 \
-d /mnt/scratch/12 -d /mnt/scratch/13 \
-d /mnt/scratch/14 -d /mnt/scratch/15 \

If you've got everything set up right, that should run at around
200-250,000 file creates/s. When finished, unmount and run:

# xfs_repair -o bhash=500000 /dev/vda

And that should spend quite a long while pounding on the mmap_sem
until the the userspace buffer cache stops growing.

I just ran the above on 3.16, saw this from perf:

37.30% [kernel] [k] _raw_spin_unlock_irqrestore
- _raw_spin_unlock_irqrestore
- 62.00% rwsem_wake
- call_rwsem_wake
+ 83.52% sys_mprotect
+ 16.23% __do_page_fault
+ 35.15% try_to_wake_up
+ 0.96% update_blocked_averages
+ 0.61% pagevec_lru_move_fn
- 23.35% [kernel] [k] _raw_spin_unlock_irq
- _raw_spin_unlock_irq
+ 51.37% finish_task_switch
+ 39.37% rwsem_down_write_failed
+ 8.49% rwsem_down_read_failed
0.62% run_timer_softirq
+ 5.22% [kernel] [k] native_read_tsc
+ 3.89% [kernel] [k] rwsem_down_write_failed
.....

Cheers,

Dave.

--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/