Re: [PATCH 19/19] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath

From: Paul E. McKenney
Date: Tue May 13 2014 - 16:25:05 EST


On Tue, May 13, 2014 at 08:57:42PM +0200, Oleg Nesterov wrote:
> On 05/13, Paul E. McKenney wrote:
> >
> > On Tue, May 13, 2014 at 05:44:35PM +0200, Peter Zijlstra wrote:
> > >
> > > Ah, yes, so I'll defer to Oleg and Linus to explain that one. As per the
> > > name: smp_mb__before_spinlock() should of course imply a full barrier.
> >
> > How about if I queue a name change to smp_wmb__before_spinlock()?
>
> I agree, this is more accurate, simply because it describes what it
> actually does.
>
> But just in case, as for try_to_wake_up() it does not actually need
> wmb() between "CONDITION = T" and "task->state = RUNNING". It would
> be fine if these 2 STORE's are re-ordered, we can rely on rq->lock.
>
> What it actually needs is a barrier between "CONDITION = T" and
> "task->state & state" check. But since we do not have a store-load
> barrier, wmb() was added to ensure that "CONDITION = T" can't leak
> into the critical section.
>
> But it seems that set_tlb_flush_pending() already assumes that it
> acts as wmb(), so probably smp_wmb__before_spinlock() is fine.

Except that when I go to make the change, I find the following in
the documentation:

Memory operations issued before the ACQUIRE may be completed after
the ACQUIRE operation has completed. An smp_mb__before_spinlock(),
combined with a following ACQUIRE, orders prior loads against
subsequent loads and stores and also orders prior stores against
subsequent stores. Note that this is weaker than smp_mb()! The
smp_mb__before_spinlock() primitive is free on many architectures.

Which means that either the documentation is wrong or the implementation
is. Yes, smp_wmb() has the semantics called out above on many platforms,
but not on Alpha or ARM.

So, as you say, set_tlb_flush_pending() only relies on smp_wmb().
The comment in try_to_wake_up() seems to be assuming a full memory
barrier. The comment in __schedule() also seems to be relying on
a full memory barrier (prior write against subsequent read). Yow!

So maybe barrier() on TSO systems like x86 and mainframe and stronger
barriers on other systems, depending on what their lock acquisition
looks like?

Or am I misinterpreting try_to_wake_up() and __schedule()?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/