Re: imapd and synchronous writes

sct@dcs.ed.ac.uk
Wed, 13 Mar 96 20:49 GMT


Hi,

On Tue, 12 Mar 1996 09:08:24 +0100, Ulrich Windl
<Ulrich.Windl@rz.uni-regensburg.de> said:

> On 11 Mar 96 at 15:31, John Gardiner Myers wrote:
>> fraioli@dg-rtp.dg.com (Marc J. Fraioli) writes:
>> > I'm looking at the docs for CMU's cygnus IMAP server, and came
>> > across the following warning in a README:
>> > [...]
>> > What is the reason for this?
>>
>> The ext2 filesystem performs directory updates asynchronously. When
>> the IMAP server (or sendmail) is given a message, it will create a
>> file for it, write out the contents, and fsync() it before informing
>> the sender that it has accepted responsibility for the message.
>>
>> However, the fsync() doesn't mean a hill of beans if the directory
>> entry for the file doesn't get committed to disk. If the machine

Yup. The solution is to fsync() the directory itself, which is
essentially what happens automatically if you set O_SYNC on the
directory.

> You are saying that fsync() violates the POSIX requirements. It seems
> that recent POSIX has a relaxed fsync() that "only writes essential
> data" to the disk. I can't remember the syscall right now. I'm rather
> sure that Ted knows about it.

fdatasync(). It's in POSIX.4.

> If fsync is broken, shouldn't it be fixed before 2.0?

Yes, if that's what POSIX.1 really specifies. But remember that there
is NO automatic correlation between directory entries and inodes under
Unix; any inode may have any number of directory entries associated
with it, including zero. I can't recall any of my POSIX books saying
anything about directory flushing in association with fsync(), and I
would be surprised if they did.

I very much expect that the behaviour on FreeBSD with async metadata
writes will be exactly the same, even if fsync() is used on the
inodes.

You really can't just blindly assume synchronous directory updates.
Even on systems using ffs, where directories are updated synchronously
(currently), it is not a wise assumption, for things may change in the
future. FreeBSD's ffs already has an option to disable sync writes,
and the authors are looking at alternatives to sync writes which
preserve the metadata consistency (by using either ordered async
writes or rollback mechanisms).

Cheers,
Stephen.

--
Stephen Tweedie <sct@dcs.ed.ac.uk>
Department of Computer Science, Edinburgh University, Scotland.