Re: [Kiobuf-io-devel] RFC: Kernel mechanism: Compound event wait

From: David Lang (
Date: Mon Feb 05 2001 - 16:08:17 EST

so you have two concepts in one here

1. SG items that can be more then a single page

2. a container for #1 that includes details for completion callbacks, etc

it looks like Linus is objecting to having both in the same structure and
then using that structure as your generic low-level bucket.

define these as two seperate structures, the #1 structure may now be
lightweight enough to be used for networking and other functions, and when
you go to use it with disk IO you then wrap it in the #2 structure. this
still lets you have the completion callbacks at as low a level as you
want, you just have to explicitly add this layer when it makes sense.

David Lang

On Mon, 5 Feb 2001, Stephen C. Tweedie wrote:

> Date: Mon, 5 Feb 2001 20:54:29 +0000
> From: Stephen C. Tweedie <>
> To: Linus Torvalds <>
> Cc: Alan Cox <>, Stephen C. Tweedie <>,
> Manfred Spraul <>,
> Christoph Hellwig <>, Steve Lord <>,
> Subject: Re: [Kiobuf-io-devel] RFC: Kernel mechanism: Compound event wait
> Hi,
> On Mon, Feb 05, 2001 at 11:28:17AM -0800, Linus Torvalds wrote:
> > The _vectors_ are needed at the very lowest levels: the levels that do not
> > necessarily have to worry at all about completion notification etc. You
> > want the arbitrary scatter-gather vectors passed down to the stuff that
> > sets up the SG arrays etc, the stuff that doesn't care AT ALL about the
> > high-level semantics.
> OK, this is exactly where we have a problem: I can see too many cases
> where we *do* need to know about completion stuff at a fine
> granularity when it comes to disk IO (unlike network IO, where we can
> usually rely on a caller doing retransmit at some point in the stack).
> If we are doing readahead, we want completion callbacks raised as soon
> as possible on IO completions, no matter how many other IOs have been
> merged with the current one. More importantly though, when we are
> merging multiple page or buffer_head IOs in a request, we want to know
> exactly which buffer/page contents are valid and which are not once
> the IO completes.
> The current request struct's buffer_head list provides that quite
> naturally, but is a hugely heavyweight way of performing large IOs.
> What I'm really after is a way of sending IOs to make_request in such
> a way that if the caller provides an array of buffer_heads, it gets
> back completion information on each one, but if the IO is requested in
> large chunks (eg. XFS's pagebufs or large kiobufs from raw IO), then
> the request code can deal with it in those large chunks.
> What worries me is things like the soft raid1/5 code: pretending that
> we can skimp on the return information about which blocks were
> transferred successfully and which were not sounds like a really bad
> idea when you've got a driver which relies on that completion
> information in order to do intelligent error recovery.
> Cheers,
> Stephen
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to
> Please read the FAQ at
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
Please read the FAQ at

This archive was generated by hypermail 2b29 : Wed Feb 07 2001 - 21:00:22 EST