Re: "raw" block devices?

Linus Torvalds (torvalds@cs.helsinki.fi)
Thu, 17 Oct 1996 19:41:52 +0300 (EET DST)


On Fri, 18 Oct 1996, David Monro wrote:
>
> Something I've been wondering about for a bit - why doesn't linux have the
> equivalent of the "raw" disk devices present on most other unixen I can think
> of (OSF, Solaris, SunOS, *BSD)? (For people who haven't seen them, the device
> name is traditionally the normal block device with a 'r' prefixed, and it is
> a char device rather than a block one). I was talking to a unix kernel guru I
> know about what they are used for, and there seem to be two things:

The main reason there are no raw devices is that I personally think that
raw devices are a stupid idea.

> 1) Messing with the raw device doesn't go through the buffer cache, so programs
> which basically scan a large device (eg fsck) don't trash the cache. Seems
> reasonable. Does e2fsck have some nifty way of not trashing cache currently?

I don't think "mkfs" or "fsck" is an argument for raw devices.

Yes, they trash the buffer cache, but so what? The buffer cache doesn't
actually help, but it doesn't really have much of a detrimental effect
either. And having a raw device just for fsck is silly: adding extra code for
something that is rarely needed and where the difference between using the
buffer cache and raw devices is not really something you care about.

> Also this allows eg database systems to be given a slice of disk which they
> are in complete control of, and can maybe manage better than the normal
> buffering (known access patterns etc).

That's a theory I don't subscribe to myself.

Sure, there are old-fashioned databases that think they can do a better job
of it than the kernel does. They are usually wrong, I suspect. They are using
raw devices more for historical reasons than anything else, and they could
just as well use a filesystem.

There _are_ databases that successfully use filesystems for storage, and they
often have obvious advantages (like having BLOBs as separate files so that
you can actually access those satellite pictures without going through the
database etc).

> 2) Because of the above, it should be possible to get data straight from the
> device into user memory without any copying.

..which is why there is "mmap". mmap is a much better interface anyway.

Now, admittedly mmap() doesn't work on any raw partitions either, but that's
more due to lack of interest than anything else. Nobody has really needed it
enough to actually write the code. Doing a readpage/writepage for a raw
partition is kind of trivial.

Again, use files and it already works. "Look ma, no memcpy".

> Actually I guess what is needed is not necessarily a new device, but possibly
> an extra (non-portable, but hey) flag for open (and maybe mmap?) to say `don't
> cache this, I'm not going to see it again'. The device is just a way of
> saying this without having to code it in the program.

We do need a flag like this, but not for read. For mmap it would probably
make sense to have a "MAP_NOCACHE" flag that just means that when the kernel
maps in a page cache page into the address space of the process it does _not_
add it to the page cache itself (so when the page is thrown out for some
reason it has to be read back in).

Linus