Re: [RFC PATCH v2 0/8] virtio/vsock: experimental zerocopy receive

From: Stefano Garzarella
Date: Mon Jun 13 2022 - 04:54:37 EST


On Thu, Jun 09, 2022 at 12:33:32PM +0000, Arseniy Krasnov wrote:
On 09.06.2022 11:54, Stefano Garzarella wrote:
Hi Arseniy,
I left some comments in the patches, and I'm adding something also here:
Thanks for comments

On Fri, Jun 03, 2022 at 05:27:56AM +0000, Arseniy Krasnov wrote:
                             INTRODUCTION

    Hello, this is experimental implementation of virtio vsock zerocopy
receive. It was inspired by TCP zerocopy receive by Eric Dumazet. This API uses
same idea: call 'mmap()' on socket's descriptor, then every 'getsockopt()' will
fill provided vma area with pages of virtio RX buffers. After received data was
processed by user, pages must be freed by 'madvise()'  call with MADV_DONTNEED
flag set(if user won't call 'madvise()', next 'getsockopt()' will fail).

If it is not too time-consuming, can we have a table/list to compare this and the TCP zerocopy?
You mean compare API with more details?

Yes, maybe a comparison from the user's point of view to do zero-copy with TCP and VSOCK.



                                DETAILS

    Here is how mapping with mapped pages looks exactly: first page mapping
contains array of trimmed virtio vsock packet headers (in contains only length
of data on the corresponding page and 'flags' field):

    struct virtio_vsock_usr_hdr {
        uint32_t length;
        uint32_t flags;
        uint32_t copy_len;
    };

Field  'length' allows user to know exact size of payload within each sequence
of pages and 'flags' allows user to handle SOCK_SEQPACKET flags(such as message
bounds or record bounds). Field 'copy_len' is described below in 'v1->v2' part.
All other pages are data pages from RX queue.

            Page 0      Page 1      Page N

    [ hdr1 .. hdrN ][ data ] .. [ data ]
          |        |       ^           ^
          |        |       |           |
          |        *-------------------*
          |                |
          |                |
          *----------------*

    Of course, single header could represent array of pages (when packet's
buffer is bigger than one page).So here is example of detailed mapping layout
for some set of packages. Lets consider that we have the following sequence  of
packages: 56 bytes, 4096 bytes and 8200 bytes. All pages: 0,1,2,3,4 and 5 will
be inserted to user's vma(vma is large enough).

In order to have a "userspace polling-friendly approach" and reduce number of syscall, can we allow for example the userspace to mmap at least the first header before packets arrive.
Then the userspace can poll a flag or other fields in the header to understand that there are new packets.
You mean to avoid 'poll()' syscall, user will spin on some flag, provided by kernel on some mapped page? I think yes. This is ok. Also i think, that i can avoid 'madvise' call
to clear memory mapping before each 'getsockopt()' - let 'getsockopt()' do 'madvise()' job by removing pages from previous data. In this case only one system call is needed - 'getsockopt()'.

Yes, that's right. I mean to support both, poll() for interrupt-based applications and the ability to actively poll a variable in the shared memory for applications that want to minimize latency.

Thanks,
Stefano