Re: [RFC PATCH v3 05/12] netdev: netdevice devmem allocator

From: David Ahern
Date: Sat Nov 11 2023 - 12:19:28 EST


On 11/10/23 7:26 AM, Pavel Begunkov wrote:
> On 11/7/23 23:03, Mina Almasry wrote:
>> On Tue, Nov 7, 2023 at 2:55 PM David Ahern <dsahern@xxxxxxxxxx> wrote:
>>>
>>> On 11/7/23 3:10 PM, Mina Almasry wrote:
>>>> On Mon, Nov 6, 2023 at 3:44 PM David Ahern <dsahern@xxxxxxxxxx> wrote:
>>>>>
>>>>> On 11/5/23 7:44 PM, Mina Almasry wrote:
>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>>>> index eeeda849115c..1c351c138a5b 100644
>>>>>> --- a/include/linux/netdevice.h
>>>>>> +++ b/include/linux/netdevice.h
>>>>>> @@ -843,6 +843,9 @@ struct netdev_dmabuf_binding {
>>>>>>   };
>>>>>>
>>>>>>   #ifdef CONFIG_DMA_SHARED_BUFFER
>>>>>> +struct page_pool_iov *
>>>>>> +netdev_alloc_devmem(struct netdev_dmabuf_binding *binding);
>>>>>> +void netdev_free_devmem(struct page_pool_iov *ppiov);
>>>>>
>>>>> netdev_{alloc,free}_dmabuf?
>>>>>
>>>>
>>>> Can do.
>>>>
>>>>> I say that because a dmabuf can be host memory, at least I am not
>>>>> aware
>>>>> of a restriction that a dmabuf is device memory.
>>>>>
>>>>
>>>> In my limited experience dma-buf is generally device memory, and
>>>> that's really its use case. CONFIG_UDMABUF is a driver that mocks
>>>> dma-buf with a memfd which I think is used for testing. But I can do
>>>> the rename, it's more clear anyway, I think.
>>>
>>> config UDMABUF
>>>          bool "userspace dmabuf misc driver"
>>>          default n
>>>          depends on DMA_SHARED_BUFFER
>>>          depends on MEMFD_CREATE || COMPILE_TEST
>>>          help
>>>            A driver to let userspace turn memfd regions into dma-bufs.
>>>            Qemu can use this to create host dmabufs for guest
>>> framebuffers.
>>>
>>>
>>> Qemu is just a userspace process; it is no way a special one.
>>>
>>> Treating host memory as a dmabuf should radically simplify the io_uring
>>> extension of this set.
>>
>> I agree actually, and I was about to make that comment to David Wei's
>> series once I have the time.
>>
>> David, your io_uring RX zerocopy proposal actually works with devmem
>> TCP, if you're inclined to do that instead, what you'd do roughly is
>> (I think):
> That would be a Frankenstein's monster api with no good reason for it.

It brings a consistent API from a networking perspective.

io_uring should not need to be in the page pool and memory management
business. Have you or David coded up the re-use of the socket APIs with
dmabuf to see how much smaller it makes the io_uring change - or even
walked through from a theoretical perspective?