Re: raw devices

From: Andrea Arcangeli (andrea@suse.de)
Date: Thu Dec 20 2001 - 12:05:47 EST


On Thu, Dec 20, 2001 at 12:22:33PM -0000, simon@baydel.com wrote:
> I have been performing some tests on raw devices, the results of
> which I do not understand. I have 4 FC busses, each with one
> SCSI disk, connect to a Linux system running 2.4.16. I use the raw
> command to bind /dev/raw1 - /dev/raw4 to each of the devices.
> With one process per raw device, running large sequential reads, I
> got a total throughput of 340 Megabytes per second. I also
> observed 85% CPU idle. Following this I performed some more
> tests and then returned to this one. This time the total had gone
> down to 180 and there was no free CPU. I realized that the first
> time I ran the tests, each of the disks that the raw devices were
> mapped to were mounted. I then verified that this data was being
> transferred along the FC bus using an analyzer while the devices
> were mounted. Can anyone explain this to me ? I find it hard to
> believe that the disk should be permitted to be mounted when
> using raw device mappings. If the disks should not be mounted
> why is there such a great performance difference ?

because when mounted rawio uses a blocksize larger than the
softblocksize (for no good reason, but that's another issue).

in short when the disk is mouned a single bh was doing I/O on a page (I
guess you were using 4k blocksize in the fs), while when it was
unmounted it was doing I/O on only 512bytes, so you needed 8 times more
bh and CPU to do the same I/O.

there are many ways to fix this, I guess it would be nice to get the bio
stuff optimized in 2.5 first (bio will be even faster than your rawio
with the fs mounted, because a single metadata entity will be able to do
I/O on more than one page, that's the whole point of the bio design
changes). Then in 2.5 rawio can also be obsoleted and modularized.
O_DIRECT will have to relax its granularity requirements for both fs and
blkdev and so then it will overlap enterely the rawio chardevice
functionality.

btw, if you try to use O_DIRECT on the fs with your current kernel, you
should be just able to read at 340mbyte/sec with most of the CPU idle.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Dec 23 2001 - 21:00:22 EST