Re: [PATCH] Introduce a method to catch mmap_region (was: Recentkernel "mount" slow)

From: Linus Torvalds
Date: Wed Nov 28 2012 - 15:13:38 EST


On Wed, Nov 28, 2012 at 12:03 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> mmap() is in *no* way special. The exact same thing happens for
> regular read/write. Yet somehow the mmap code is special-cased, while
> the normal read-write code is not.

I just double-checked, because it's been a long time since I actually
looked at the code.

But yeah, block device read/write uses the pure page cache functions.
IOW, it has the *exact* same IO engine as mmap() would have.

So here's my suggestion:

- get rid of *all* the locking in aio_read/write and the splice paths
- get rid of all the stupid mmap games

- instead, add them to the functions that actually use
"blkdev_get_block()" and "blkdev_get_blocks()" and nowhere else.

That's a fairly limited number of functions:
blkdev_{read,write}page(), blkdev_direct_IO() and
blkdev_write_{begin,end}()

Doesn't that sounds simpler? And more logical: it protects the actual
places that use the block size of the device.

I dunno. Maybe there is some fundamental reason why the above is
broken, but it seems to be a much simpler approach. Sure, you need to
guarantee that the people who get the write-lock cannot possibly cause
IO while holding it, but since the only reason to get the write lock
would be to change the block size, that should be pretty simple, no?

Yeah, yeah, I'm probably missing something fundamental, but the above
sounds like the simple approach to fixing things. Aiming for having
the block size read-lock be taken by the things that pass in the
block-size itself.

It would be nice for things to be logical and straightforward.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/