Re: [Kiobuf-io-devel] RFC: Kernel mechanism: Compound event wait

From: Stephen C. Tweedie (
Date: Tue Feb 06 2001 - 20:49:28 EST


On Tue, Feb 06, 2001 at 04:50:19PM -0800, Linus Torvalds wrote:
> On Wed, 7 Feb 2001, Stephen C. Tweedie wrote:
> >
> > That gets us from 512-byte blocks to 4k, but no more (ll_rw_block
> > enforces a single blocksize on all requests but that relaxing that
> > requirement is no big deal). Buffer_heads can't deal with data which
> > spans more than a page right now.
> "struct buffer_head" can deal with pretty much any size: the only thing it
> cares about is bh->b_size.

Right now, anything larger than a page is physically non-contiguous,
and sorry if I didn't make that explicit, but I thought that was
obvious enough that I didn't need to. We were talking about raw IO,
and as long as we're doing IO out of user anonymous data allocated
from individual pages, buffer_heads are limited to that page size in
this context.

> Have you ever spent even just 5 minutes actually _looking_ at the block
> device layer, before you decided that you think it needs to be completely
> re-done some other way? It appears that you never bothered to.

Yes. We still have this fundamental property: if a user sends in a
128kB IO, we end up having to split it up into buffer_heads and doing
a separate submit_bh() on each single one. Given our VM, PAGE_SIZE
(*not* PAGE_CACHE_SIZE) is the best granularity we can hope for in
this case.

THAT is the overhead that I'm talking about: having to split a large
IO into small chunks, each of which just ends up having to be merged
back again into a single struct request by the *make_request code.

A constructed IO request basically doesn't care about anything in the
buffer_head except for the data pointer and size, and the completion
status info and callback. All of the physical IO description is in
the struct request by this point. The chain of buffer_heads is
carrying around a huge amount of information which isn't used by the
IO, and if the caller is something like the raw IO driver which isn't
using the buffer cache, that extra buffer_head data is just overhead.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
Please read the FAQ at

This archive was generated by hypermail 2b29 : Wed Feb 07 2001 - 21:00:25 EST