Re: [RFC PATCH v3 05/12] netdev: netdevice devmem allocator

From: Pavel Begunkov
Date: Tue Nov 14 2023 - 11:11:10 EST


On 11/11/23 17:19, David Ahern wrote:
On 11/10/23 7:26 AM, Pavel Begunkov wrote:
On 11/7/23 23:03, Mina Almasry wrote:
On Tue, Nov 7, 2023 at 2:55 PM David Ahern <dsahern@xxxxxxxxxx> wrote:

On 11/7/23 3:10 PM, Mina Almasry wrote:
On Mon, Nov 6, 2023 at 3:44 PM David Ahern <dsahern@xxxxxxxxxx> wrote:

On 11/5/23 7:44 PM, Mina Almasry wrote:
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index eeeda849115c..1c351c138a5b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -843,6 +843,9 @@ struct netdev_dmabuf_binding {
  };

  #ifdef CONFIG_DMA_SHARED_BUFFER
+struct page_pool_iov *
+netdev_alloc_devmem(struct netdev_dmabuf_binding *binding);
+void netdev_free_devmem(struct page_pool_iov *ppiov);

netdev_{alloc,free}_dmabuf?


Can do.

I say that because a dmabuf can be host memory, at least I am not
aware
of a restriction that a dmabuf is device memory.


In my limited experience dma-buf is generally device memory, and
that's really its use case. CONFIG_UDMABUF is a driver that mocks
dma-buf with a memfd which I think is used for testing. But I can do
the rename, it's more clear anyway, I think.

config UDMABUF
         bool "userspace dmabuf misc driver"
         default n
         depends on DMA_SHARED_BUFFER
         depends on MEMFD_CREATE || COMPILE_TEST
         help
           A driver to let userspace turn memfd regions into dma-bufs.
           Qemu can use this to create host dmabufs for guest
framebuffers.


Qemu is just a userspace process; it is no way a special one.

Treating host memory as a dmabuf should radically simplify the io_uring
extension of this set.

I agree actually, and I was about to make that comment to David Wei's
series once I have the time.

David, your io_uring RX zerocopy proposal actually works with devmem
TCP, if you're inclined to do that instead, what you'd do roughly is
(I think):
That would be a Frankenstein's monster api with no good reason for it.

It brings a consistent API from a networking perspective.

io_uring should not need to be in the page pool and memory management
business. Have you or David coded up the re-use of the socket APIs with
dmabuf to see how much smaller it makes the io_uring change - or even
walked through from a theoretical perspective?

Yes, we did the mental exercise, which is why we're converting to pp.
I don't see many opportunities for reuse for the main data path,
potentially apart from using the iov format instead of pages.

If the goal is to minimise the amount of code, it can mimic the tcp
devmem api with netlink, ioctl-ish buffer return, but that'd be a
pretty bad api for io_uring, overly complicated and limiting
optimisation options. If not, then we have to do some buffer
management in io_uring, and I don't see anything wrong with that. It
shouldn't be a burden for networking if all that extra code is
contained in io_uring and only exposed via pp ops and following
the rules.

--
Pavel Begunkov