Re: Raw devices (Was:Re: NTFS, FAT32, etc.)

Martin von Loewis (martin@mira.isdn.cs.tu-berlin.de)
Thu, 8 May 1997 14:39:13 +0200


> What heavy iron databases seem to want is the ability to schedule
> all their own I/O using AIO in blocksized buffers with the kernel
> side doing the I/O direct to/from the given buffer in user space.
> No buffer copying. No kernel memory wasted in buffering. No read
> ahead other than that specifically done by the program.

OK, I understand the issue of the unnecessary copies - although I doubt
that systems with 'raw devices' directly 'DMA to user space' when writing
to an SCSI disk.
However, I thought that databases are interested in guaranteed completion,
i.e. once write(2) returns, the system should guarantee that the data is
really on disk. This is necessary for the transactional properties. Without
such a guarantee, you can pretty much forget about transactional recoveries
after a crash. Wouldn't O_SYNC give you the same properties? As for real
implementations: Does anybody know whether the Adabas or Postgres Linux
ports do use O_SYNC?

Thanks,
Martin