Re: [sqlite] light weight write barriers

From: Vladislav Bolkhovitin
Date: Mon Nov 19 2012 - 20:23:49 EST


Vladislav Bolkhovitin, on 11/17/2012 12:02 AM wrote:
The easiest way to implement this fsync would involve three things:
1. Schedule writes for all dirty pages in the fs cache that belong to
the affected file, wait for the device to report success, issue a cache
flush to the device (or request ordering commands, if available) to make
it tell the truth, and wait for the device to report success. AFAIK this
already happens, but without taking advantage of any request ordering
commands.
2. The requesting thread returns as soon as the kernel has identified
all data that will be written back. This is new, but pretty similar to
what AIO already does.
3. No write is allowed to enqueue any requests at the device that
involve the same file, until all outstanding fsync complete [3]. This is
new.

This sounds interesting as a way to expose some useful semantics to userspace.

I assume we'd need to come up with a new syscall or something since it doesn't
match the behaviour of posix fsync().

This is how I would export cache sync and requests ordering abstractions to the
user space:

For async IO (io_submit() and friends) I would extend struct iocb by flags, which
would allow to set the required capabilities, i.e. if this request is FUA, or full
cache sync, immediate [1] or not, ORDERED or not, or all at the same time, per
each iocb.

For the regular read()/write() I would add to "flags" parameter of
sync_file_range() one more flag: if this sync is immediate or not.

To enforce ordering rules I would add one more command to fcntl(). It would make
the latest submitted write in this fd ORDERED.

Correction. To avoid possible races better that the new fcntl() command would specify that N subsequent read()/write()/sync() calls as ORDERED.

For instance, in the simplest case of N=1, one next after fcntl() write() would be handled as ORDERED.

(Unfortunately, it doesn't look like this old read()/write() interface has space for a more elegant solution)

Vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/