Re: Memory providers multiplexing (Was: [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE_FRAG flag)

From: Mina Almasry
Date: Sun Jul 16 2023 - 21:54:02 EST


On Fri, Jul 14, 2023 at 8:55 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
>
> On Fri, Jul 14, 2023 at 07:55:15AM -0700, Mina Almasry wrote:
>
> > Once the skb frags with struct new_abstraction are in the TCP stack,
> > they will need some special handling in code accessing the frags. But
> > my RFC already addressed that somewhat because the frags were
> > inaccessible in that case. In this case the frags will be both
> > inaccessible and will not be struct pages at all (things like
> > get_page() will not work), so more special handling will be required,
> > maybe.
>
> It seems sort of reasonable, though there will be interesting concerns
> about coherence and synchronization with generial purpose DMABUFs that
> will need tackling.
>
> Still it is such a lot of churn and weridness in the netdev side, I
> think you'd do well to present an actual full application as
> justification.
>
> Yes, you showed you can stick unordered TCP data frags into GPU memory
> sort of quickly, but have you gone further with this to actually show
> it is useful for a real world GPU centric application?
>
> BTW your cover letter said 96% utilization, the usual server
> configuation is one NIC per GPU, so you were able to hit 1500Gb/sec of
> TCP BW with this?
>

I do notice that the number of NICs is missing from our public
documentation so far, so I will refrain from specifying how many NICs
are on those A3 VMs until the information is public. But I think I can
confirm that your general thinking is correct, the perf that we're
getting is 96.6% line rate of each GPU/NIC pair, and scales linearly
for each NIC/GPU pair we've tested with so far. Line rate of each
NIC/GPU pair is 200 Gb/sec.

So if we have 8 NIC/GPU pairs we'd be hitting 96.6% * 200 * 8 = 1545 GB/sec.
If we have, say, 2 NIC/GPU pairs, we'd be hitting 96.6% * 200 * 2 = 384 GB/sec
...
etc.

--
Thanks,
Mina