Re: [RFC PATCH v3 10/12] tcp: RX path for devmem TCP

From: Pavel Begunkov
Date: Fri Dec 08 2023 - 15:35:33 EST


On 11/6/23 22:55, Willem de Bruijn wrote:
On Mon, Nov 6, 2023 at 2:34 PM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote:

On 11/06, Willem de Bruijn wrote:
IMHO, we need a better UAPI to receive the tokens and give them back to
the kernel. CMSG + setsockopt(SO_DEVMEM_DONTNEED) get the job done,
but look dated and hacky :-(

We should either do some kind of user/kernel shared memory queue to
receive/return the tokens (similar to what Jonathan was doing in his
proposal?)

I'll take a look at Jonathan's proposal, sorry, I'm not immediately
familiar but I wanted to respond :-) But is the suggestion here to
build a new kernel-user communication channel primitive for the
purpose of passing the information in the devmem cmsg? IMHO that seems
like an overkill. Why add 100-200 lines of code to the kernel to add
something that can already be done with existing primitives? I don't
see anything concretely wrong with cmsg & setsockopt approach, and if
we switch to something I'd prefer to switch to an existing primitive
for simplicity?

The only other existing primitive to pass data outside of the linear
buffer is the MSG_ERRQUEUE that is used for zerocopy. Is that
preferred? Any other suggestions or existing primitives I'm not aware
of?

or bite the bullet and switch to io_uring.


IMO io_uring & socket support are orthogonal, and one doesn't preclude
the other. As you know we like to use sockets and I believe there are
issues with io_uring adoption at Google that I'm not familiar with
(and could be wrong). I'm interested in exploring io_uring support as
a follow up but I think David Wei will be interested in io_uring
support as well anyway.

I also disagree that we need to replace a standard socket interface
with something "faster", in quotes.

This interface is not the bottleneck to the target workload.

Replacing the synchronous sockets interface with something more
performant for workloads where it is, is an orthogonal challenge.
However we do that, I think that traditional sockets should continue
to be supported.

The feature may already even work with io_uring, as both recvmsg with
cmsg and setsockopt have io_uring support now.

I'm not really concerned with faster. I would prefer something cleaner :-)

Or maybe we should just have it documented. With some kind of path
towards beautiful world where we can create dynamic queues..

I suppose we just disagree on the elegance of the API.

The concise notification API returns tokens as a range for
compression, encoding as two 32-bit unsigned integers start + length.
It allows for even further batching by returning multiple such ranges
in a single call.

FWIW, nothing prevents io_uring from compressing ranges. The io_uring
zc RFC returns {offset, size} as well, though at the moment the would
lie in the same page.

This is analogous to the MSG_ZEROCOPY notification mechanism from
kernel to user.

The synchronous socket syscall interface can be replaced by something
asynchronous like io_uring. This already works today? Whatever

If you mean async io_uring recv, it does work. In short, internally
it polls the socket and then calls sock_recvmsg(). There is also a
feature that would make it return back to polling after sock_recvmsg()
and loop like this.

asynchronous ring-based API would be selected, io_uring or otherwise,
I think the concise notification encoding would remain as is.

Since this is an operation on a socket, I find a setsockopt the
fitting interface.

--
Pavel Begunkov