Re: [Kiobuf-io-devel] RFC: Kernel mechanism: Compound event wait /notify + callback chains

From: Stephen C. Tweedie (
Date: Thu Feb 01 2001 - 16:25:08 EST


On Thu, Feb 01, 2001 at 09:46:27PM +0100, Christoph Hellwig wrote:

> > Right now we can take a kiobuf and turn it into a bunch of
> > buffer_heads for IO. The io_count lets us track all of those sub-IOs
> > so that we know when all submitted IO has completed, so that we can
> > pass the completion callback back up the chain without having to
> > allocate yet more descriptor structs for the IO.
> > Again, remove this and the IO becomes more heavyweight because we need
> > to create a separate struct for the info.
> No. Just allow passing the multiple of the devices blocksize over
> ll_rw_block.

That was just one example: you need the sub-ios just as much when
you split up an IO over stripe boundaries in LVM or raid0, for
example. Secondly, ll_rw_block needs to die anyway: you can expand
the blocksize up to PAGE_SIZE but not beyond, whereas something like
ll_rw_kiobuf can submit a much larger IO atomically (and we have
devices which don't start to deliver good throughput until you use
IO sizes of 1MB or more).

> >> and the lack of
> >> scatter gather in one kiobuf struct (you always need an array)
> > Again, _all_ data being sent down through the block device layer is
> > either in buffer heads or is page aligned.
> That's the point. You are always talking about the block-layer only.

I'm talking about why the minimal, generic solution doesn't provide
what the block layer needs.

> > Obviously, extra code will be needed to scan kiobufs if we do that,
> > and unless we have both per-page _and_ per-kiobuf start/offset pairs
> > (adding even further to the complexity), those scatter-gather lists
> > would prevent us from carving up a kiobuf into smaller sub-ios without
> > copying the whole (expanded) vector.
> No. I think I explained that in my last mail.


If I've got a vector (page X, offset 0, length PAGE_SIZE) and I want
to split it in two, I have to make two new vectors (page X, offset 0,
length n) and (page X, offset n, length PAGE_SIZE-n). That implies
copying both vectors.

If I have a page vector with a single offset/length pair, I can build
a new header with the same vector and modified offset/length to split
the vector in two without copying it.

> > Possibly, but I remain to be convinced, because you may end up with a
> > mechanism which is generic but is not well-tuned for any specific
> > case, so everything goes slower.
> As kiobufs are widely used for real IO, just as containers, this is
> better then nothing.

Surely having all of the subsystems working fast is better still?

> And IMHO a nice generic concepts that lets different subsystems work
> toegther is a _lot_ better then a bunch of over-optimized, rather isolated
> subsytems. The IO-Lite people have done a nice research of the effect of
> an unified IO-Caching system vs. the typical isolated systems.

I know, and IO-Lite has some major problems (the close integration of
that code into the cache, for example, makes it harder to expose the
zero-copy to user-land).

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
Please read the FAQ at

This archive was generated by hypermail 2b29 : Wed Feb 07 2001 - 21:00:13 EST