Re: [PATCH -tip 2/3] sched/wake_q: Relax to acquire semantics

From: Peter Zijlstra
Date: Tue Sep 15 2015 - 05:50:00 EST


On Mon, Sep 14, 2015 at 02:08:06PM -0700, Davidlohr Bueso wrote:
> On Mon, 14 Sep 2015, Peter Zijlstra wrote:
>
> >On Mon, Sep 14, 2015 at 12:37:23AM -0700, Davidlohr Bueso wrote:
> >> /*
> >>+ * Atomically grab the task. If ->wake_q is non-nil (failed cmpxchg)
> >>+ * then the task is already queued (by us or someone else) and will
> >>+ * get the wakeup due to that.
> >> *
> >>+ * Use acquire semantics to add the next pointer, which pairs with the
> >>+ * write barrier implied by the wakeup in wake_up_list().
> >> */
> >>+ if (cmpxchg_acquire(&node->next, NULL, WAKE_Q_TAIL))
> >> return;
> >>
> >> get_task_struct(task);
> >
> >I'm not seeing a _why_ on the acquire semantics. Not saying the patch is
> >wrong, just saying I want words on why acquire is correct.
>
> Well, I was just taking advantage of removing the upper barrier. Considering
> that the formal semantics, you are right that we need not actual acquire per-se
> (ie for node->next) but instead merely ensure a barrier in wake_q_add(). This is
> kind of why I had hinted of going full _relaxed(). We could also rephrase the
> comment, something like:
>
> * Use ACQUIRE semantics to add the next pointer, such that
> * wake_q_add() implies a full barrier. This pairs with the
> * write barrier implied by the wakeup in wake_up_list().
> */
>
> What do you think?

Still befuddled. I'm thinking that if you want to remove a barrier,
you'd remove that second and keep the first. That is RELEASE.

That way, you know the stores prior to the wake queue are done by the
time you observe the queued entry, and therefore (transitively) know
those stores are done by the time you do the actual wakeup.

Two issues with that though; firstly RELEASE is not actually guaranteed
to be transitive -- now the only arch that does not implement it with a
full barrier is ARGH64, so we could just ask Will, but I'm not sure its
'good' to start relying on this.

Secondly, the wake queues are not concurrent, they're in context, so I
don't see ordering matter at all. The only reason its a cmpxchg() is
because there is the (small) possibility of two contexts wanting to wake
the same task, and we use task_struct storage for the queue.

Or am I mistaken and do we have concurrent users of wake queues?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/