Re: [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns

From: Matt Whitlock
Date: Wed Jul 19 2023 - 17:02:22 EST


On Wednesday, 19 July 2023 16:16:07 EDT, Linus Torvalds wrote:
The *ONLY* reason for splice() existing is for zero-copy.

The very first sentence of splice(2) reads: "splice() moves data between two file descriptors without copying between kernel address space and user address space." Thus, it is not unreasonable to believe that the point of splice is to avoid copying between user-space and kernel-space.

If you use read() and write(), then you're making two copies. If you use splice(), then you're making one copy (or zero, but that's an optimization that should be invisible to the user).

And no, we don't start some kind of crazy "versioned zero-copy with
COW". That's a fundamental mistake.

Agreed. splice() should steal the reference if it can, copy the page data if it must. Note that, even in the slow case where the page data must be copied, this still gives a better-than-50% speedup over read()+write() since an entire copy (and one syscall) is elided.

IF YOU DON'T UNDERSTAND THE *POINT* OF SPLICE, DON'T USE SPLICE.

Thanks for being so condescending. Your reputation is deserved.