Re: [PATCH] Bio Traversal Changes

From: Suparna Bhattacharya (
Date: Tue Aug 06 2002 - 07:17:25 EST

On Mon, Aug 05, 2002 at 10:47:39AM -0500, James Bottomley wrote:
> said:
> > There is only one call to ->request_fn for the entire request, and the
> > drivers manages things underneath. The chunks are expected to complete
> > sequentially. In the situation where the request is restarted in the
> > event of an error (say), the submission pointers are rolled back to
> > the last (successfully) completed point before issuing the request
> > again.
> Yes, that's the way I thought it would operate.
> said:
> > I must say that I initially did think that this could be extended to
> > the more generic case which you probably are referring to and that
> > such an approach could take away the need to split bios in certain
> > cases (i.e. when the i/o is destined for a single queue). Later it
> > appeared that trying to cover the case where each of these pieces
> > gets queued up and might complete out of order (requiring a tag to
> > correlate things on completion), would most likely boil down to
> > trying to maintain all the state that struct request does today.
> For this more generic case, most of our problems seem to be because the
> barrier has width: It actually belongs to an I/O request. If the barrier had
> zero width (i.e. it was simply a barrier in the stream with no I/O attached)
> then it would be much easier to preserve it correctly across this (or any
> other) type of bio splitting. It would also make it much more obvious to the
> implementing driver where the barrier was supposed to be in the I/O stream,
> and would allow more efficient "wait for completion" barrier implementations
> for drivers that couldn't enforce it any other way.
> > Would be nice (for me) to understand this in more detail. There might
> > be some possibilities. Any pointers that I can look up to get a
> > clearer idea ?
> The SCSI standards ( are the only real authoritative source (with
> even some explanation). However, I'll do my best to summarise.
> In SCSI, commands are allowed to disconnect, that is suspend temporarily while
> the device does other things. When the device implements tag command
> queueing, it is allowed to disconnect one command and subsequently reconnect
> (restart) a different one. In theory, this means that we can have multiple
> active I/Os at once. The way you signal to the scsi device that you want a
> barrier is to label one or more of the tags as "ordered" which means that the
> device must complete all I/O of tags prior to the ordered one before it and
> may not begin I/O of subsequent tags until the ordered tag has completed.
> looping a single request over a big bio means that the SCSI device sees the
> I/O as a discrete stream of tags. However, we lose throughput if we stall the
> queue waiting for this single bio to complete and we can't work out what the
> next tag is until the prior tag completes. In the non barrier case,
> everything will still be OK as long as the queue isn't stalled because we'll
> be getting throughput from other bios coming down.
> I think basically, I'd like to translate as much of the bio as I can into SCSI
> tags to improve throughput and each tag currently requires a struct request.

I didn't think of the possibility of serializing the chunks
of a single request, while letting other requests on the queue through
in the no barrier situation. That's a thought, though it might result
in non-optimal scans ... and in that sense affect the throughput.
But, now I see why the barrier case was the one you were mainly worried

> > Does completion notification happen only when all the commands
> > covered by a single tag complete ? Otherwise, what is the ordering
> > amongst the multiple commands in question (do they complete in serial
> > order as well) ?
> Yes and no. You get a special completion code (INTERMEDIATE_TASK_COMPLETE)
> which says "I've finished this bit, give me the next part". You don't get a
> real SCSI completion until the last part of the linked task set completes.
> The task is linked sequentially, so it does complete in serial order.

Thanks for the explanation. I think I get the gist.

> However, Don't worry about the linked task stuff, it's a rather esoteric area
> of the SCSI standard (that allows a single tag to be used across multiple I/Os
> in very much the same way the bio splitting works) which, on mature
> reflection, probably isn't such a good idea to use since I'd be doubtful about
> how well it's implemented in the devices we have to deal with.



> James
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

This archive was generated by hypermail 2b29 : Wed Aug 07 2002 - 22:00:31 EST