Re: Bug in short splice to socket?

From: David Howells
Date: Thu Jun 01 2023 - 13:15:40 EST


Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> > However, this might well cause a malfunction in UDP, for example.
> > MSG_MORE corks the current packet, so if I ask sendfile() say shove 32K
> > into a packet, if, say, 16K is read from the source and entirely
> > transcribed into the packet,
>
> If you use splice() for UDP, I don't think you would normally expect
> to get all that well-defined packet boundaries.

Actually, it will. Attempting to overfill a UDP packet with splice will get
you -EMSGSIZE. It won't turn a splice into more than one UDP packet.

I wonder if the right solution actually is to declare that the problem is
userspace's. If you ask it to splice Z amount of data and it can't manage
that because the source dries up prematurely, then make it so that you assume
it always passed MSG_MORE and returns a short splice to userspace. Userspace
can retry the splice/sendfile or do an empty sendmsg() to cap the message (or
cancel it). Perhaps flushing a short message is actually a *bad* idea.

The answer then might be to make TLS handle a zero-length send() and fix the
test cases. Would the attached changes then work for you?

David
---
diff --git a/fs/splice.c b/fs/splice.c
index 3e06611d19ae..237688b0700b 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -956,13 +956,17 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
*/
bytes = 0;
len = sd->total_len;
+
+ /* Don't block on output, we have to drain the direct pipe. */
flags = sd->flags;
+ sd->flags &= ~SPLICE_F_NONBLOCK;

/*
- * Don't block on output, we have to drain the direct pipe.
+ * We signal MORE until we've read sufficient data to fulfill the
+ * request and we keep signalling it if the caller set it.
*/
- sd->flags &= ~SPLICE_F_NONBLOCK;
more = sd->flags & SPLICE_F_MORE;
+ sd->flags |= SPLICE_F_MORE;

WARN_ON_ONCE(!pipe_empty(pipe->head, pipe->tail));

@@ -978,14 +982,12 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
sd->total_len = read_len;

/*
- * If more data is pending, set SPLICE_F_MORE
- * If this is the last data and SPLICE_F_MORE was not set
- * initially, clears it.
+ * If we now have sufficient data to fulfill the request then
+ * we clear SPLICE_F_MORE if it was not set initially.
*/
- if (read_len < len)
- sd->flags |= SPLICE_F_MORE;
- else if (!more)
+ if (read_len >= len && !more)
sd->flags &= ~SPLICE_F_MORE;
+
/*
* NOTE: nonblocking mode only applies to the input. We
* must not do the output in nonblocking mode as then we
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index f63e4405cf34..5d48391da16c 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -995,6 +995,9 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
}
}

+ if (!msg_data_left(msg) && eor)
+ goto copied;
+
while (msg_data_left(msg)) {
if (sk->sk_err) {
ret = -sk->sk_err;
diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
index e699548d4247..7df31583f2a4 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -377,7 +377,7 @@ static void chunked_sendfile(struct __test_metadata *_metadata,
char buf[TLS_PAYLOAD_MAX_LEN];
uint16_t test_payload_size;
int size = 0;
- int ret;
+ int ret = 0;
char filename[] = "/tmp/mytemp.XXXXXX";
int fd = mkstemp(filename);
off_t offset = 0;
@@ -398,6 +398,9 @@ static void chunked_sendfile(struct __test_metadata *_metadata,
size -= ret;
}

+ if (ret < chunk_size)
+ EXPECT_EQ(send(self->fd, "", 0, 0), 0);
+
EXPECT_EQ(recv(self->cfd, buf, test_payload_size, MSG_WAITALL),
test_payload_size);