Re: "raw" block devices?

Mark H. Wood (mwood@mhw.oit.iupui.edu)
Fri, 18 Oct 1996 08:11:44 -0500 (EST)


On Thu, 17 Oct 1996, Linus Torvalds wrote:

>
>
> On Thu, 17 Oct 1996, Ingo Molnar wrote:
> > >
> > > Sure, there are old-fashioned databases that think they can do a better job
> > > of it than the kernel does. They are usually wrong, I suspect. They are using
> > > raw devices more for historical reasons than anything else, and they could
> > > just as well use a filesystem.
> >
> > [yes, raw devices are a hack, still RDBMS ppl use it because:]
> >
> > one not-so obvious problem is that an RDBMS >has< to implement a
> > write-cache for itself. Thus if the block device would be buffered too (in
> > the kernel), then we had double buffering. [as it is buffered now]
>
> Not strictly true. Yes, there are ordering constraints, but you can handle
> them on a filesystem too (and people do). A raw device gives you more
> low-level control, but on the other hand it _also_ results in less chance for
> the OS to optimize data transfers for those cases where the optimizations
> would be valid.
>
> You can handle write ordering by using a log-based database (never overwrite
> any old data, so write ordering doesn't matter), and do a "fsync()" on the
> file when you commit. Voila, you just got guarantees about the file being
> on disk (assuming fsync() works as it is supposed to), and it doesn't
> matter if the OS decided it needed to write out part of the data earlier
> (in fact, that's only good for overlapping IO and calculations).

For example, RDB/VMS uses fairly ordinary files that you could just as
well access with normal QIO or RMS calls if you knew the internal
structure. I think it just maintains tight control over the ordering of
its writes when that's necessary. (Of course, VMS gives it tools to do
that, or this wouldn't work.) Digital did a *lot* of work to discover
where the order-dependencies are in their product, and it paid off.
(Trouble is, they got strapped for cash, so now it's paying off for Oracle
instead.)

Mark H. Wood, Lead System Programmer MWOOD@INDYVAX.IUPUI.EDU
Those who will not learn from history are doomed to reimplement it.