"raw" block devices?

David Monro (davidm@fuzzbox.psrg.cs.usyd.edu.au)
Fri, 18 Oct 1996 01:57:52 +1000 (EST)


Something I've been wondering about for a bit - why doesn't linux have the
equivalent of the "raw" disk devices present on most other unixen I can think
of (OSF, Solaris, SunOS, *BSD)? (For people who haven't seen them, the device
name is traditionally the normal block device with a 'r' prefixed, and it is
a char device rather than a block one). I was talking to a unix kernel guru I
know about what they are used for, and there seem to be two things:

1) Messing with the raw device doesn't go through the buffer cache, so programs
which basically scan a large device (eg fsck) don't trash the cache. Seems
reasonable. Does e2fsck have some nifty way of not trashing cache currently?
Also this allows eg database systems to be given a slice of disk which they
are in complete control of, and can maybe manage better than the normal
buffering (known access patterns etc).

2) Because of the above, it should be possible to get data straight from the
device into user memory without any copying. This should be a big win eg for
the above mentioned database system. Actually it should be possible to
do this anyway using copy-on-write and having the kernel copy the page only
if it is modified by the program (using the same phys memory in both the
cache and the user space). Currently I believe we don't do this; correct me if
I am wrong.

Actually I guess what is needed is not necessarily a new device, but possibly
an extra (non-portable, but hey) flag for open (and maybe mmap?) to say `don't
cache this, I'm not going to see it again'. The device is just a way of
saying this without having to code it in the program.

Am I missing something really obvious here? (Note - even if mmap didn't trash
cache, which I am certain it does (page cache?) it doesn't work for this - try
mmapping something >4Gb :-( )

David