Re: [RFC PATCH v5 00/19] virtio/vsock: introduce SOCK_SEQPACKET support

From: Stefano Garzarella
Date: Wed Feb 24 2021 - 03:39:54 EST


On Wed, Feb 24, 2021 at 11:28:50AM +0300, Arseny Krasnov wrote:

On 24.02.2021 11:23, Stefano Garzarella wrote:
On Wed, Feb 24, 2021 at 07:29:25AM +0300, Arseny Krasnov wrote:
On 23.02.2021 17:50, Stefano Garzarella wrote:
On Mon, Feb 22, 2021 at 03:23:11PM +0100, Stefano Garzarella wrote:
Hi Arseny,

On Thu, Feb 18, 2021 at 08:33:44AM +0300, Arseny Krasnov wrote:
This patchset impelements support of SOCK_SEQPACKET for virtio
transport.
As SOCK_SEQPACKET guarantees to save record boundaries, so to
do it, two new packet operations were added: first for start of record
and second to mark end of record(SEQ_BEGIN and SEQ_END later). Also,
both operations carries metadata - to maintain boundaries and payload
integrity. Metadata is introduced by adding special header with two
fields - message count and message length:

struct virtio_vsock_seq_hdr {
__le32 msg_cnt;
__le32 msg_len;
} __attribute__((packed));

This header is transmitted as payload of SEQ_BEGIN and SEQ_END
packets(buffer of second virtio descriptor in chain) in the same way as
data transmitted in RW packets. Payload was chosen as buffer for this
header to avoid touching first virtio buffer which carries header of
packet, because someone could check that size of this buffer is equal
to size of packet header. To send record, packet with start marker is
sent first(it's header contains length of record and counter), then
counter is incremented and all data is sent as usual 'RW' packets and
finally SEQ_END is sent(it also carries counter of message, which is
counter of SEQ_BEGIN + 1), also after sedning SEQ_END counter is
incremented again. On receiver's side, length of record is known from
packet with start record marker. To check that no packets were dropped
by transport, counters of two sequential SEQ_BEGIN and SEQ_END are
checked(counter of SEQ_END must be bigger that counter of SEQ_BEGIN by
1) and length of data between two markers is compared to length in
SEQ_BEGIN header.
Now as packets of one socket are not reordered neither on
vsock nor on vhost transport layers, such markers allows to restore
original record on receiver's side. If user's buffer is smaller that
record length, when all out of size data is dropped.
Maximum length of datagram is not limited as in stream socket,
because same credit logic is used. Difference with stream socket is
that user is not woken up until whole record is received or error
occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags.
Tests also implemented.
I reviewed the first part (af_vsock.c changes), tomorrow I'll review
the rest. That part looks great to me, only found a few minor issues.
I revieiwed the rest of it as well, left a few minor comments, but I
think we're well on track.

I'll take a better look at the specification patch tomorrow.
Great, Thank You
Thanks,
Stefano

In the meantime, however, I'm getting a doubt, especially with regard
to other transports besides virtio.

Should we hide the begin/end marker sending in the transport?

I mean, should the transport just provide a seqpacket_enqueue()
callbacl?
Inside it then the transport will send the markers. This is because
some transports might not need to send markers.

But thinking about it more, they could actually implement stubs for
that calls, if they don't need to send markers.

So I think for now it's fine since it allows us to reuse a lot of
code, unless someone has some objection.
I thought about that, I'll try to implement it in next version. Let's see...
If you want to discuss it first, write down the idea you want to
implement, I wouldn't want to make you do unnecessary work. :-)

Idea is simple, in iov iterator of 'struct msghdr' which is passed to

enqueue callback we have two fields: 'iov_offset' which is byte

offset inside io vector where next data must be picked and 'count'

which is rest of unprocessed bytes in io vector. So in seqpacket

enqueue callback if 'iov_offset' is 0 i'll send SEQBEGIN, and if

'count' is 0 i'll send SEQEND.


Got it, make sense and it's defently more transparent for the vsock core!
Go head, maybe adding a comment in the vsock core explaining this, so other developers can understand better if they want to support SEPACKET in other transports.

Thanks,
Stefano