Re: [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe

From: David Howells
Date: Fri Jun 30 2023 - 12:51:45 EST


Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

>
> Quite the reverse. I'd be willing to *simplify* splice() by just
> saying "it was all a mistake", and just turning it into wrappers
> around read/write. But those patches would have to be radical
> simplifications, not adding yet more crud on top of the pain that is
> splice().
>
> Because it will hurt performance. And I'm ok with that as long as it
> comes with huge simplifications. What I'm *not* ok with is "I mis-used
> splice, now I want splice to act differently, so let's make it even
> more complicated".

If we want to go down the simplification route, then the patches I posted
might be a good start.

The idea I tried to work towards is that the pipe only ever contains private
pages in it that only the pipe has a ref on and that no one else can access
until they come out the other end again. I got rid of the ->confirm() pipe
buf op and would like to kill off all of the others too.

I simplified splice() by:

- Making sure any candidate pages are uptodate right up front.

- Allowing automatic stealing of pages from the pagecache if no one else is
using them. This should avoid losing a chunk of the performance that
splice is supposed to gain - but if you're serving pages repeatedly in a
webserver with this, it's going to be a problem.

Possibly this should be contingent on SPLICE_F_MOVE - though the manpage
says "*from* the pipe" implying it's only usable on the output side.

- Copying in every other circumstance.

I simplified vmsplice() by:

- If SPLICE_F_GIFT is set, attempting to steal whole pages in the buffer up
front if not in use by anyone else.

- Copying in every other circumstance.

That said, there are still sources that I didn't touch yet that attempt to
insert pages into a pipe: relayfs (which does some accounting stuff based on
the final consumption of the pages it inserted), sockets (which don't allow
inserted pages to be stolen) and notifications (which don't want to allocate
at notification time - but I can deal with that). And there's tee() (which
would need to copy the data). And pipe-to-pipe splice (which could steal
whole pages, but would otherwise have to copy).


If you would prefer to go for utter simplification, we could make sendfile()
from a buffered file just call sendmsg() directly with MSG_SPLICE_PAGES set
and ignore the pipe entirely (I'm tempted to do this anyway) and then make
splice() to a pipe just do copy_splice_read() and vmsplice() to a pipe do
writev().

I wonder how much splice() is used compared to sendfile().


I would prefer to leave splice() and vmsplice() as they are now and adjust the
documentation. As you say, they can be considered a type of zerocopy.

David