Re: [PATCH v3 06/13] epoll: introduce helpers for adding/removing events to uring

From: Roman Penyaev
Date: Fri May 31 2019 - 07:19:09 EST


On 2019-05-31 11:56, Peter Zijlstra wrote:
On Thu, May 16, 2019 at 10:58:03AM +0200, Roman Penyaev wrote:
+static inline bool ep_add_event_to_uring(struct epitem *epi, __poll_t pollflags)
+{
+ struct eventpoll *ep = epi->ep;
+ struct epoll_uitem *uitem;
+ bool added = false;
+
+ if (WARN_ON(!pollflags))
+ return false;
+
+ uitem = &ep->user_header->items[epi->bit];
+ /*
+ * Can be represented as:
+ *
+ * was_ready = uitem->ready_events;
+ * uitem->ready_events &= ~EPOLLREMOVED;
+ * uitem->ready_events |= pollflags;
+ * if (!was_ready) {
+ * // create index entry
+ * }
+ *
+ * See the big comment inside ep_remove_user_item(), why it is
+ * important to mask EPOLLREMOVED.
+ */
+ if (!atomic_or_with_mask(&uitem->ready_events,
+ pollflags, EPOLLREMOVED)) {
+ unsigned int i, *item_idx, index_mask;
+
+ /*
+ * Item was not ready before, thus we have to insert
+ * new index to the ring.
+ */
+
+ index_mask = ep_max_index_nr(ep) - 1;
+ i = __atomic_fetch_add(&ep->user_header->tail, 1,
+ __ATOMIC_ACQUIRE);

afaict __atomic_fetch_add() does not exist.

That is gcc extension. I did not find any API just to increment
the variable atomically without using/casting to atomic. What
is a proper way to achieve that?


+ item_idx = &ep->user_index[i & index_mask];
+
+ /* Signal with a bit, which is > 0 */
+ *item_idx = epi->bit + 1;

Did you just increment the user visible tail pointer before you filled
the data? That is, can the concurrent userspace observe the increment
before you put credible data in its place?

No, the "data" is the "ready_events" mask, which was updated before,
using cmpxchg, atomic_or_with_mask() call. All I need is to put an
index of just updated item to the uring.

Userspace, in its turn, gets the index from the ring and then checks
the mask.


+
+ /*
+ * Want index update be flushed from CPU write buffer and
+ * immediately visible on userspace side to avoid long busy
+ * loops.
+ */
+ smp_wmb();

That's still complete nonsense.

Yes, true. My confusion came from the simple test, where one thread
swaps pointers in a loop, another thread dereferences pointer and
increments a variable:

THR#0
-----------

unsigned vvv1 = 0, vvv2 = 0;
unsigned *ptr;

ptr = &vvv1;
thr_level2 = &vvv2;

while (!stop) {
unsigned *tmp = *thr_level2;
*thr_level2 = ptr;
barrier(); <<<< ????
ptr = tmp;
}

THR#1
-----------

while (!stop) {
ptr = thr_level2;
(*ptr)++;
}


At the end I expect `vvv1` and `vvv2` are approximately equally
incremented. But, without barrier() only one variable is
incremented.

Now I see that barrier() should be defined as a simple compiler
barrier as asm volatile("" ::: "memory"), and there is nothing
related with write buffer as I wrote in the comment.

So indeed garbage and can be removed. Thanks.

--
Roman