Re: [PATCH v3 06/13] epoll: introduce helpers for adding/removing events to uring

From: Peter Zijlstra
Date: Fri May 31 2019 - 12:55:31 EST


On Fri, May 31, 2019 at 04:21:30PM +0200, Roman Penyaev wrote:

> The ep_add_event_to_uring() is lockless, thus I can't increase tail after,
> I need to reserve the index slot, where to write to. I can use shadow tail,
> which is not seen by userspace, but I have to guarantee that tail is updated
> with shadow tail *after* all callers of ep_add_event_to_uring() are left.
> That is possible, please see the code below, but it adds more complexity:
>
> (code was tested on user side, thus has c11 atomics)
>
> static inline void add_event__kernel(struct ring *ring, unsigned bit)
> {
> unsigned i, cntr, commit_cntr, *item_idx, tail, old;
>
> i = __atomic_fetch_add(&ring->cntr, 1, __ATOMIC_ACQUIRE);
> item_idx = &ring->user_itemsindex[i % ring->nr];
>
> /* Update data */
> *item_idx = bit;
>
> commit_cntr = __atomic_add_fetch(&ring->commit_cntr, 1,
> __ATOMIC_RELEASE);
>
> tail = ring->user_header->tail;
> rmb();
> do {
> cntr = ring->cntr;
> if (cntr != commit_cntr)
> /* Someone else will advance tail */
> break;
>
> old = tail;
>
> } while ((tail =
> __sync_val_compare_and_swap(&ring->user_header->tail, old, cntr)) != old);
> }

Yes, I'm well aware of that particular problem (see
kernel/events/ring_buffer.c:perf_output_put_handle for instance). But
like you show, it can be done. It also makes the thing wait-free, as
opposed to merely lockless.