Re: OFFTOPIC: Regarding NT vs Linux

Dean Gaudet (dgaudet-list-linux-kernel@arctic.org)
Wed, 24 Sep 1997 03:46:38 -0700 (PDT)


On Tue, 23 Sep 1997, Larry McVoy wrote:

> The file system, when in O_DIRECT mode, does DMA from the disk to the user
> buffer. The network, when in page aligned (note the valloc) does copy
> on write on transmit (and page flips on receive). The networking hardware
> does the checksum. In both cases, the processor never sees the data.

Interesting ... but does it require you to use those non-portable
constructs? Or will a more traditional mmap() approach do the job as
well?

Solaris 2.6 w/sun's ATM card also does zero-copy TCP, I believe it's main
requirement is that you write() a multiple of 16k from a mmap()d file. It
unfortunately won't do it if you use writev(). (Think http headers, and
the first chunk of the file ... apache 1.3 does this.)

> SGI is looking at a copy_file() like interface that I designed before I
> left. It's called "splice(from, to, length)" and I have a few notes I
> can post if the list is interested.

Sun is looking at this too.

>From the point of view of a web server, your splice() works well for
non-byte-range serving. But as soon as you have to do byte-range serving
you'd end up with a series of lseek()s and splice()s. Whereas with mmap()
you can get by with just a series of writev()s. Byte-range serving isn't
in huge demand at the moment (only PDF files make use of it).

To serve a static file a web server needs the stat() info to generate the
header. It'd be interesting to see how a open_stat() call which opens a
file and returns an fstat() of it, used with splice() would perform. And
contrast that against a mmap_file() call which takes a filename, and
returns both a stat() of it and an mmapped region (but no filehandle). I'd
guess mmap_file() would be a win on shorter files.

Dean