Re: [PATCH 1/1] percpu_rw_semaphore: reimplement to not block thereaders unnecessarily

From: Oleg Nesterov
Date: Fri Nov 02 2012 - 12:17:23 EST


On 11/01, Oleg Nesterov wrote:
>
> On 11/01, Paul E. McKenney wrote:
> >
> > OK, so it looks to me that this code relies on synchronize_sched()
> > forcing a memory barrier on each CPU executing in the kernel.
>
> No, the patch tries to avoid this assumption, but probably I missed
> something.
>
> > 1. A task running on CPU 0 currently write-holds the lock.
> >
> > 2. CPU 1 is running in the kernel, executing a longer-than-average
> > loop of normal instructions (no atomic instructions or memory
> > barriers).
> >
> > 3. CPU 0 invokes percpu_up_write(), calling up_write(),
> > synchronize_sched(), and finally mutex_unlock().
>
> And my expectation was, this should be enough because ...
>
> > 4. CPU 1 executes percpu_down_read(), which calls update_fast_ctr(),
>
> since update_fast_ctr does preempt_disable/enable it should see all
> modifications done by CPU 0.
>
> IOW. Suppose that the writer (CPU 0) does
>
> percpu_done_write();
> STORE;
> percpu_up_write();
>
> This means
>
> STORE;
> synchronize_sched();
> mutex_unlock();
>
> Now. Do you mean that the next preempt_disable/enable can see the
> result of mutex_unlock() but not STORE?

So far I think this is not possible, so the code doesn't need the
additional wstate/barriers.

> > +static bool update_fast_ctr(struct percpu_rw_semaphore *brw, int val)
> > +{
> > + bool success = false;
>
> int state;
>
> > +
> > + preempt_disable();
> > + if (likely(!mutex_is_locked(&brw->writer_mutex))) {
>
> state = ACCESS_ONCE(brw->wstate);
> if (likely(!state)) {
>
> > + __this_cpu_add(*brw->fast_read_ctr, val);
> > + success = true;
>
> } else if (state & WSTATE_NEED_MB) {
> __this_cpu_add(*brw->fast_read_ctr, val);
> smb_mb(); /* Order increment against critical section. */
> success = true;
> }

...

> > +void percpu_up_write(struct percpu_rw_semaphore *brw)
> > +{
> > + /* allow the new readers, but only the slow-path */
> > + up_write(&brw->rw_sem);
>
> ACCESS_ONCE(brw->wstate) = WSTATE_NEED_MB;
>
> > +
> > + /* insert the barrier before the next fast-path in down_read */
> > + synchronize_sched();

But update_fast_ctr() should see mutex_is_locked(), obiously down_write()
must ensure this.

So update_fast_ctr() can execute the WSTATE_NEED_MB code only if it
races with

> ACCESS_ONCE(brw->wstate) = 0;
>
> > + mutex_unlock(&brw->writer_mutex);

these 2 stores and sees them in reverse order.



I guess that mutex_is_locked() in update_fast_ctr() looks a bit confusing.
It means no-fast-path for the reader, we could use ->state instead.

And even ->writer_mutex should go away if we want to optimize the
write-contended case, but I think this needs another patch on top of
this initial implementation.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/