Re: [RFC][PATCH] New iovec support & VFS changes

From: Avi Kivity
Date: Tue Dec 20 2005 - 12:59:25 EST


Badari Pulavarty wrote:

I was trying to add support for preadv()/pwritev() for threaded
databases. Currently the patch is in -mm tree.

http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15-
rc5/2.6.15-rc5-mm3/broken-out/support-for-preadv-pwritev.patch

This needs a new set of system calls. Ulrich Drepper pointed out
that, instead of adding a system call for the limited functionality
it provides, why not we add new iovec interface as follows (offset-per-
segment) which provides greater functionality & flexibility.

+struct niovec
+{
+ void __user *iov_base;
+ __kernel_size_t iov_len;
+ __kernel_loff_t iov_off; /* NEW */
+};

In order to support this, we need to change all the file_operations
(readv/writev) and its helper functions to take this new structure.

I took a stab at doing it and I want feedback on whether this is
acceptable. All the patch does - is to make kernel use new structure,
but the existing syscalls like readv()/writev() still deals with
original one to keep the compatibility. (pipes and sockets need changing too - which I have not addressed yet).

Is this the right approach ?



You can io_submit() a list of IO_CMD_PREAD[V]s and immediately io_getevents() them. In addition to specifying different file offsets you can mix reads and writes, mix file descriptors, and reap nonblocking events quickly (by specifying a timeout of zero).

Sure, it's two syscalls instead of one, but it's much more flexibles, and databases should be using aio anyway. Oh, and no kernel changes needed, apart from merging vectored aio.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/