Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket

From: Jason Wang
Date: Mon May 13 2019 - 23:41:58 EST



On 2019/5/14 äå11:25, Jason Wang wrote:

On 2019/5/14 äå1:23, Stefano Garzarella wrote:
On Mon, May 13, 2019 at 05:58:53PM +0800, Jason Wang wrote:
On 2019/5/10 äå8:58, Stefano Garzarella wrote:
Since virtio-vsock was introduced, the buffers filled by the host
and pushed to the guest using the vring, are directly queued in
a per-socket list avoiding to copy it.
These buffers are preallocated by the guest with a fixed
size (4 KB).

The maximum amount of memory used by each socket should be
controlled by the credit mechanism.
The default credit available per-socket is 256 KB, but if we use
only 1 byte per packet, the guest can queue up to 262144 of 4 KB
buffers, using up to 1 GB of memory per-socket. In addition, the
guest will continue to fill the vring with new 4 KB free buffers
to avoid starvation of her sockets.

This patch solves this issue copying the payload in a new buffer.
Then it is queued in the per-socket list, and the 4KB buffer used
by the host is freed.

In this way, the memory used by each socket respects the credit
available, and we still avoid starvation, paying the cost of an
extra memory copy. When the buffer is completely full we do a
"zero-copy", moving the buffer directly in the per-socket list.

I wonder in the long run we should use generic socket accouting mechanism
provided by kernel (e.g socket, skb, sndbuf, recvbug, truesize) instead of
vsock specific thing to avoid duplicating efforts.
I agree, the idea is to switch to sk_buff but this should require an huge
change. If we will use the virtio-net datapath, it will become simpler.


Yes, unix domain socket is one example that uses general skb and socket structure. And we probably need some kind of socket pair on host. Using socket can also simplify the unification with vhost-net which depends on the socket proto_ops to work. I admit it's a huge change probably, we can do it gradually.



Signed-off-by: Stefano Garzarella <sgarzare@xxxxxxxxxx>
---
ÂÂ drivers/vhost/vsock.cÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ |Â 2 +
ÂÂ include/linux/virtio_vsock.hÂÂÂÂÂÂÂÂÂÂÂ |Â 8 +++
ÂÂ net/vmw_vsock/virtio_transport.cÂÂÂÂÂÂÂ |Â 1 +
ÂÂ net/vmw_vsock/virtio_transport_common.c | 95 ++++++++++++++++++-------
ÂÂ 4 files changed, 81 insertions(+), 25 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index bb5fc0e9fbc2..7964e2daee09 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -320,6 +320,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
ÂÂÂÂÂÂÂÂÂÂ return NULL;
ÂÂÂÂÂÂ }
+ÂÂÂ pkt->buf_len = pkt->len;
+
ÂÂÂÂÂÂ nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
ÂÂÂÂÂÂ if (nbytes != pkt->len) {
ÂÂÂÂÂÂÂÂÂÂ vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index e223e2632edd..345f04ee9193 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -54,9 +54,17 @@ struct virtio_vsock_pkt {
ÂÂÂÂÂÂ void *buf;
ÂÂÂÂÂÂ u32 len;
ÂÂÂÂÂÂ u32 off;
+ÂÂÂ u32 buf_len;
ÂÂÂÂÂÂ bool reply;
ÂÂ };
+struct virtio_vsock_buf {
+ÂÂÂ struct list_head list;
+ÂÂÂ void *addr;
+ÂÂÂ u32 len;
+ÂÂÂ u32 off;
+};
+
ÂÂ struct virtio_vsock_pkt_info {
ÂÂÂÂÂÂ u32 remote_cid, remote_port;
ÂÂÂÂÂÂ struct vsock_sock *vsk;
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 15eb5d3d4750..af1d2ce12f54 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -280,6 +280,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ break;
ÂÂÂÂÂÂÂÂÂÂ }
+ÂÂÂÂÂÂÂ pkt->buf_len = buf_len;
ÂÂÂÂÂÂÂÂÂÂ pkt->len = buf_len;
ÂÂÂÂÂÂÂÂÂÂ sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 602715fc9a75..0248d6808755 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -65,6 +65,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
ÂÂÂÂÂÂÂÂÂÂ pkt->buf = kmalloc(len, GFP_KERNEL);
ÂÂÂÂÂÂÂÂÂÂ if (!pkt->buf)
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ goto out_pkt;
+
+ÂÂÂÂÂÂÂ pkt->buf_len = len;
+
ÂÂÂÂÂÂÂÂÂÂ err = memcpy_from_msg(pkt->buf, info->msg, len);
ÂÂÂÂÂÂÂÂÂÂ if (err)
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ goto out;
@@ -86,6 +89,46 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
ÂÂÂÂÂÂ return NULL;
ÂÂ }
+static struct virtio_vsock_buf *
+virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy)
+{
+ÂÂÂ struct virtio_vsock_buf *buf;
+
+ÂÂÂ if (pkt->len == 0)
+ÂÂÂÂÂÂÂ return NULL;
+
+ÂÂÂ buf = kzalloc(sizeof(*buf), GFP_KERNEL);
+ÂÂÂ if (!buf)
+ÂÂÂÂÂÂÂ return NULL;
+
+ÂÂÂ /* If the buffer in the virtio_vsock_pkt is full, we can move it to
+ÂÂÂÂ * the new virtio_vsock_buf avoiding the copy, because we are sure that
+ÂÂÂÂ * we are not use more memory than that counted by the credit mechanism.
+ÂÂÂÂ */
+ÂÂÂ if (zero_copy && pkt->len == pkt->buf_len) {
+ÂÂÂÂÂÂÂ buf->addr = pkt->buf;
+ÂÂÂÂÂÂÂ pkt->buf = NULL;
+ÂÂÂ } else {

Is the copy still needed if we're just few bytes less? We meet similar issue
for virito-net, and virtio-net solve this by always copy first 128bytes for
big packets.

See receive_big()
I'm seeing, It is more sophisticated.
IIUC, virtio-net allocates a sk_buff with 128 bytes of buffer, then copies the
first 128 bytes, then adds the buffer used to receive the packet as a frag to
the skb.


Yes and the point is if the packet is smaller than 128 bytes the pages will be recycled.


To be clear, this only work if you use order 0 page instead of a large buffer that is allocated through kmalloc(). Another requirement for order 0 page.

Thanks