Re: patch: aio + bio for raw io

From: Linus Torvalds (torvalds@transmeta.com)
Date: Fri Feb 08 2002 - 17:54:51 EST


In article <20020208171327.B12788@redhat.com>,
Benjamin LaHaise <bcrl@redhat.com> wrote:
>On Fri, Feb 08, 2002 at 01:07:58PM -0800, Badari Pulavarty wrote:
>> I am looking at the 2.5 patch you sent out. I have few questions/comments:
>>
>> 1) brw_kvec_async() does not seem to split IO at BIO_MAX_SIZE. I thought
>> each bio can handle only BIO_MAX_SIZE (ll_rw_kio() is creating one bio
>> for each BIO_MAX_SIZE IO).
>
>Sounds like a needless restriction in bio, especially as one of the design
>requirements for the 2.5 block work is that we're able to support large ios
>(think 4MB page support).

bio can handle arbitrarily large IO's, BUT it can never split them.

Basically, IO splitting is NOT the job of the IO layer. So you can make
any size request you want, but you had better know that the hardware you
send it to can take it. The bio layer basically guarantees only that
you can send a single contiguous request of PAGE_SIZE, nothing more (in
fact, we might at some point get away from even that, and only guarantee
sectors - with things like loopback remapping etc you might have trouble
even for "contiguous" requests).

Now, before you say "that's stupid, I don't know what the driver limits
are", ask yourself:
 - what is it that you want to go fast?
 - what is it that you CAN make fast?

The answer to the "want" question is: the common case. And like it or
not, the common case is never going to be 4MB pages.

The answer to the "can" question is: merging can be fast, splitting
fundamentally cannot.

Splitting a request _fundamentally_ involves memory management (at the
very least you have to allocate a new request), while growing a request
can (and does) mean just adding an entry to the end of a list (until you
cannot grow it any more, of course, but that's the point where you have
to end anyway, so..)

Now, think about that for five minutes, and if you don't come back with
the right answer, you get an F.

In short:

 - the right answer to handling 4MB pages is not to push complexity into
   the low-level drivers and make them try to handle requests that are
   bigger than the hardware can do.

   In fact, we don't even want to handle it in the mid layers, because
   (a) the mid layers have historically been even more flaky than some
   device drivers and (b) it's a performance loss to even test for the
   common case where the splitting is neither needed nor wanted.

 - the _right_ answer to handling big areas is to build up big bio's
   from smaller ones. And no, you don't have to call the elevator in
   between requests that you know are consecutive on the disk.

   Another way of saying it: if you have 4MB worth of IO, it's YOUR
   resposibility to do the work to make it fit the controller. It is off
   the default case, and _you_ do a bit of extra work instead of asking
   everybody else to do your heavy lifting for you.

Does bio have the interfaces to do this yet? No. But if you think that
bio's should natively handle any kind of request at all, you're really
barking up the wrong tree.

If you are in the small small _small_ minority care about 4MB requests,
you should build the infrastructure not to make drivers split them, but
to build up a list of bio's and then submit them all consecutively in
one go.

Remember: checking the limits as you build stuff up is easy, and fast.

So you should make sure that you never EVER cause anybody to want to
split a bio.

                Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Feb 15 2002 - 21:00:23 EST