Re: imapd and synchronous writes

Ulrich Windl (Ulrich.Windl@rz.uni-regensburg.de)
Thu, 14 Mar 1996 08:33:22 +0100


On 13 Mar 96 at 20:49, sct@dcs.ed.ac.uk wrote:

> Hi,
>
> On Tue, 12 Mar 1996 09:08:24 +0100, Ulrich Windl
> <Ulrich.Windl@rz.uni-regensburg.de> said:
>
> > On 11 Mar 96 at 15:31, John Gardiner Myers wrote:
> >> fraioli@dg-rtp.dg.com (Marc J. Fraioli) writes:
> >> > I'm looking at the docs for CMU's cygnus IMAP server, and came
> >> > across the following warning in a README:
> >> > [...]
> >> > What is the reason for this?
> >>
> >> The ext2 filesystem performs directory updates asynchronously. When
> >> the IMAP server (or sendmail) is given a message, it will create a
> >> file for it, write out the contents, and fsync() it before informing
> >> the sender that it has accepted responsibility for the message.
> >>
> >> However, the fsync() doesn't mean a hill of beans if the directory
> >> entry for the file doesn't get committed to disk. If the machine
>
> Yup. The solution is to fsync() the directory itself, which is
> essentially what happens automatically if you set O_SYNC on the
> directory.
>
> > You are saying that fsync() violates the POSIX requirements. It seems
> > that recent POSIX has a relaxed fsync() that "only writes essential
> > data" to the disk. I can't remember the syscall right now. I'm rather
> > sure that Ted knows about it.
>
> fdatasync(). It's in POSIX.4.
>
> > If fsync is broken, shouldn't it be fixed before 2.0?
>
> Yes, if that's what POSIX.1 really specifies. But remember that there
> is NO automatic correlation between directory entries and inodes under
> Unix; any inode may have any number of directory entries associated
> with it, including zero. I can't recall any of my POSIX books saying
> anything about directory flushing in association with fsync(), and I
> would be surprised if they did.
>
> I very much expect that the behaviour on FreeBSD with async metadata
> writes will be exactly the same, even if fsync() is used on the
> inodes.
>
> You really can't just blindly assume synchronous directory updates.
> Even on systems using ffs, where directories are updated synchronously
> (currently), it is not a wise assumption, for things may change in the
> future. FreeBSD's ffs already has an option to disable sync writes,
> and the authors are looking at alternatives to sync writes which
> preserve the metadata consistency (by using either ordered async
> writes or rollback mechanisms).

I seems that we'll need something like "opendir(..., O_SYNC)", or at
least a new mount option to default to synchronous directory (only)
writes.

BTW: Does "open("file", O_CREAT|O_SYNC, ...)" cause the directory
entry be written immediately, or is O_SYNC only related to the file
descriptor? Maybe the POSIX team assumed what I always thought, and
therefore forgot to specify directory writes.

>
> Cheers,
> Stephen.
> --
> Stephen Tweedie <sct@dcs.ed.ac.uk>
> Department of Computer Science, Edinburgh University, Scotland.
>
>
------------
Ulrich Windl Klinikum der Universitaet Regensburg
Rechenzentrum DV-med Franz-Josef-Strauss-Allee 11
Tel: +49 941 944-5879 D-93053 Regensburg
FAX: +49 941 944-5882
Just imagine my mail address were <Ulrich.Windl@rz.uni.r.de>...