Re: [PATCH RFC 3/8] memory-provider: dmabuf devmem memory provider

From: Yunsheng Lin
Date: Tue Nov 14 2023 - 03:23:38 EST


+cc Christian, Jason and Willy

On 2023/11/14 7:05, Jakub Kicinski wrote:
> On Mon, 13 Nov 2023 05:42:16 -0800 Mina Almasry wrote:
>> You're doing exactly what I think you're doing, and what was nacked in RFC v1.
>>
>> You've converted 'struct page_pool_iov' to essentially become a
>> duplicate of 'struct page'. Then, you're casting page_pool_iov* into
>> struct page* in mp_dmabuf_devmem_alloc_pages(), then, you're calling
>> mm APIs like page_ref_*() on the page_pool_iov* because you've fooled
>> the mm stack into thinking dma-buf memory is a struct page.

Yes, something like above, but I am not sure about the 'fooled the mm
stack into thinking dma-buf memory is a struct page' part, because:
1. We never let the 'struct page' for devmem leaking out of net stacking
through the 'not kmap()able and not readable' checking in your patchset.
2. We inititiate page->_refcount for devmem to one and it remains as one,
we will never call page_ref_inc()/page_ref_dec()/get_page()/put_page(),
instead, we use page pool's pp_frag_count to do reference counting for
devmem page in patch 6.

>>
>> RFC v1 was almost exactly the same, except instead of creating a
>> duplicate definition of struct page, it just allocated 'struct page'
>> instead of allocating another struct that is identical to struct page
>> and casting it into struct page.

Perhaps it is more accurate to say this is something between RFC v1 and
RFC v3, in order to decouple 'struct page' for devmem from mm subsystem,
but still have most unified handling for both normal memory and devmem
in page pool and net stack.

The main difference between this patchset and RFC v1:
1. The mm subsystem is not supposed to see the 'struct page' for devmem
in this patchset, I guess we could say it is decoupled from the mm
subsystem even though we still call PageTail()/page_ref_count()/
page_is_pfmemalloc() on 'struct page' for devmem.

The main difference between this patchset and RFC v3:
1. It reuses the 'struct page' to have more unified handling between
normal page and devmem page for net stack.
2. It relies on the page->pp_frag_count to do reference counting.

>>
>> I don't think what you're doing here reverses the nacks I got in RFC
>> v1. You also did not CC any dma-buf or mm people on this proposal that
>> would bring up these concerns again.
>
> Right, but the mirror struct has some appeal to a non-mm person like
> myself. The problem IIUC is that this patch is the wrong way around, we
> should be converting everyone who can deal with non-host mem to struct
> page_pool_iov. Using page_address() on ppiov which hns3 seems to do in
> this series does not compute for me.

The hacking use of ppiov in hns3 is only used to do the some prototype
testing, so ignore it.

>
> Then we can turn the existing non-iov helpers to be a thin wrapper with
> just a cast from struct page to struct page_pool_iov, and a call of the
> iov helper. Again - never cast the other way around.

I am agreed that a cast from struct page to struct page_pool_iov is allowed,
but a cast from struct page_pool_iov to struct page is not allowed if I am
understanding you correctly.

Before we can also completely decouple 'struct page' allocated using buddy
allocator directly from mm subsystem in netstack, below is what I have in
mind in order to support different memory provider.

+--------------+
| Netstack |
|'struct page' |
+--------------+
^
|
|
v
+---------------------+
+----------------------+ | | +---------------+
| devmem MP |<---->| Page pool |----->| **** MP |
|'struct page_pool_iov'| | 'struct page' | |'struct **_iov'|
+----------------------+ | | +---------------+
+---------------------+
^
|
|
v
+---------------+
| Driver |
| 'struct page' |
+---------------+

I would expect net stack, page pool, driver still see the 'struct page',
only memory provider see the specific struct for itself, for the above,
devmem memory provider sees the 'struct page_pool_iov'.

The reason I still expect driver to see the 'struct page' is that driver
will still need to support normal memory besides devmem.

>
> Also I think this conversion can be done completely separately from the
> mem provider changes. Just add struct page_pool_iov and start using it.

I am not sure I understand what does "Just add struct page_pool_iov and
start using it" mean yet.

>
> Does that make more sense?
>
> .
>