Re: imapd and synchronous writes

John Gardiner Myers (jgm+@cmu.edu)
Mon, 18 Mar 1996 19:25:50 -0500 (EST)


sct@dcs.ed.ac.uk writes:
> Hi,

Hi. Thank you for a very well-reasoned response.

For perspective, the problem here is that of reliability of Internet
mail. E-mail users have extremely high reliability expectations, and
not having mail disappear due to predictable problems such as power
outages, etc. is a problem we e-mail applications weenies take very
seriously. This problem reduces to the task of knowing when a message
has been committed to non-volatile storage, so one can safely inform
the upstream peer that they can forget about the message. It would
behoove you OS weenies to keep this requirement in mind.

> Modern EIDE drives with write-behind can also screw the
> O/S's sync write ordering.

The problem is not that of ordering, but of knowing when something has
been committed to non-volatile storage. If, in fact, the drives
inform the OS that they have committed a write before they in fact
have, you've got some pretty unreliable drives.

> And finally, NFS has NEVER made any guarantees like this

And it has always been that people who run their mail system
(spool/mqueue or spool/mail) over NFS Deserve To Lose.

> Not true. {int fd = open(".", O_RDONLY, 0); int rc = fsync(fd); close(fd)}

Since the namei() call could be a performance issue, systems which
need this should provide a feature-test for applications to key off
of. Some have suggested "__linux", but that's not a feature test.
It's a system test.

> I never said it was. If you really want that behaviour, ext2fs gives
> you three ways to request it: by filesystem default, by per-directory
> attribute, or explicitly on demand by the application.

By filesystem default is not administratively practical.

I currently have text in my installation instructions instructing
people to set the synchronous bit on appropriate directories--this
thread was started by someone commenting on those instructions. But
how many Linux distributions install /var/spool/mqueue or
/var/spool/mail with the synchronous bit set? How many people on this
list have that bit set on those directories? I would suspect the
number approximates zero.

By the application is best, since the application developer tends to
know better than the sysadmin or the distribtion vendor when such is
necessary. But there needs to be a feature test.

> It's worth
> remembering that even on a sync-metadata ffs, rename() is not
> guaranteed to be atomic, and you can be left with both the old and the
> new dirents present after a crash.

That's not relevant to this set of applications. What's relevant is
knowing when the operation has been completed. Before then, the worst
that can happen is a duplicate delivery.

> No, but they can easily "chattr -R +S /var/spool/mail". If you mount
> a ffs partition on /var/spool with delayed writes enabled, you have
> exactly the same problem. That comes down to a broken installation.

So, how many Linux distributions are *not* broken this way? Can I get
interest in the Linux community to fix this?

> The
> real deficiency is the lack of any defined semantics in Unix/POSIX,
> and the lack of any standard way for an application to request a
> certain level of service with regards to directories.

To the extent that POSIX grossly under-specifies Unix, that is a
problem with POSIX. The NT POSIX box is a prime example of a POSIX
conforming system that is useless for writing applications.

To the extent that the Linux community uses lack of specification in
POSIX as an excuse to fail to provide necessary functionality, that is
a problem with the Linux community.

-- 
_.John G. Myers		Internet: jgm+@CMU.EDU
			LoseNet:  ...!seismo!ihnp4!wiscvm.wisc.edu!give!up