Re: [RFC]: Support for zero-copy TCP transmit of user space data

From: Vladislav Bolkhovitin
Date: Fri Dec 19 2008 - 14:18:11 EST


Jens Axboe, on 12/19/2008 10:07 PM wrote:
On Fri, Dec 19 2008, Vladislav Bolkhovitin wrote:
David M. Lloyd, on 12/18/2008 09:43 PM wrote:
On 12/18/2008 12:35 PM, Vladislav Bolkhovitin wrote:
An iSCSI target driver iSCSI-SCST was a part of the patchset (http://lkml.org/lkml/2008/12/10/293). For it a nice optimization to have TCP zero-copy transmit of user space data was implemented. Patch, implementing this optimization was also sent in the patchset, see http://lkml.org/lkml/2008/12/10/296.
I'm probably ignorant of about 90% of the context here, but isn't this the sort of problem that was supposed to have been solved by vmsplice(2)?
No, vmsplice can't help here. ISCSI-SCST is a kernel space driver. But, even if it was a user space driver, vmsplice wouldn't change anything much. It doesn't have a possibility for a user to know, when transmission of the data finished. So, it is intended to be used as: vmsplice() buffer -> munmap() the buffer -> mmap() new buffer -> vmsplice() it. But on the mmap() stage kernel has to zero all the newly mapped pages and zeroing memory isn't much faster, than copying it. Hence, there would be no considerable performance increase.

vmsplice() isn't the right choice, but splice() very well could be. You
could easily use splice internally as well. The vmsplice() part sort-of
applies in the sense that you want to fill pages into a pipe, which is
essentially what vmsplice() does. You'd need some helper to do that.

Sorry, Jens, but splice() works only if there is a file handle on the another side, so user space doesn't see data buffers. But SCST needs to serve a wider usage cases, like reading data with decompression from a virtual tape, where decompression is done in user space. For those only complete zero-copy network send, which I implemented, can give the best performance.

And
the ack-on-xmit-done bits is something that splice-to-socket needs
anyway, so I think it'd be quite a suitable choice for this.

So, are you writing that splice() could also benefit from the zero-copy transmit feature, like I implemented?

Thanks,
Vlad


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/