Re: [Kiobuf-io-devel] RFC: Kernel mechanism: Compound event wait /notify + callback chains

From: Stephen C. Tweedie (
Date: Mon Feb 05 2001 - 17:58:04 EST


On Mon, Feb 05, 2001 at 10:28:37PM +0100, Ingo Molnar wrote:
> On Mon, 5 Feb 2001, Stephen C. Tweedie wrote:
> it's exactly these 'compound' structures i'm vehemently against. I do
> think it's a design nightmare. I can picture these monster kiobufs
> complicating the whole code for no good reason - we couldnt even get the
> bh-list code in block_device.c right - why do you think kiobufs *all
> across the kernel* will be any better?
> RAID0 is not an issue. Split it up, use separate kiobufs for every
> different disk.

Umm, that's not the point --- of course you can use separate kiobufs
for the communication between raid0 and the underlying disks, but what
do you then tell the application _above_ raid0 if one of the
underlying IOs succeeds and the other fails halfway through?

And what about raid1? Are you really saying that raid1 doesn't need
to know which blocks succeeded and which failed? That's the level of
completion information I'm worrying about at the moment.

> fragmented skbs are a different matter: they are simply a bit more generic
> abstractions of 'memory buffer'. Clear goal, clear solution. I do not
> think kiobufs have clear goals.

The goal: allow arbitrary IOs to be pushed down through the stack in
such a way that the callers can get meaningful information back about
what worked and what did not. If the write was a 128kB raw IO, then
you obviously get coarse granularity of completion callback. If the
write was a series of independent pages which happened to be
contiguous on disk, you actually get told which pages hit disk and
which did not.

> and what is the goal of having multi-page kiobufs. To avoid having to do
> multiple function calls via a simpler interface? Shouldnt we optimize that
> codepath instead?

The original multi-page buffers came from the map_user_kiobuf
interface: they represented a user data buffer. I'm not wedded to
that format --- we can happily replace it with a fine-grained sg list
--- but the reason they have been pushed so far down the IO stack is
the need for accurate completion information on the originally
requested IOs.

In other words, even if we expand the kiobuf into a sg vector list,
when it comes to merging requests in ll_rw_blk.c we still need to
track the callbacks on each independent source kiobufs.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
Please read the FAQ at

This archive was generated by hypermail 2b29 : Wed Feb 07 2001 - 21:00:22 EST