Re: Scheduled Transfer Protocol on Linux

From: Larry McVoy (lm@bitmover.com)
Date: Sun Feb 13 2000 - 15:58:13 EST


: +-----
: | The original proposal from Larry McVoy was to replace the electronics now
: | inside a disk drive with an embedded general purpose processor. I'll repeat
: +--->8
:
: So? This differs from current SCSI drives only in degree, not in kind. If
: the drive vendors felt the need to do it, I feel quite certain they could do
: it, and in sufficient volume to make it competitive with high-end SCSI.

Exactly.

: | Imagine that cheapo IDIE drives could be bought soon with 100BaseT and
: | not so soon with GigEth over copper connecters. They cost about $100
: | more than your current IDE drives (which are essentially free :-)
: |
: | Imagine Linux with STP in the kernel on _both_ ends of the connection.
: +--->8
:
: Right, "simple" NASD drives in a private SAN. This differs from SCSI
: exactly how?

SCSI already runs layered on top of STP just fine, so to some extent it
doesn't differ at all.

What it does do is give you a high bandwidth, low CPU cycle way of moving
the data.

The simplified view of STP is really just remote DMA. How it works, again
greatly simplified, is sort of like this:

        client server
        request to send 1GB ->
                
                        <- please wait while I set up some pages (optional)

                        <- clear to send first 100MB of data, here's a cookie
        
        data on wire, prefixed with incrementing cookie ->
        data on wire, prefixed with incrementing cookie ->
        data on wire, prefixed with incrementing cookie ->
        data on wire, prefixed with incrementing cookie ->
        data on wire, prefixed with incrementing cookie ->
        data on wire, prefixed with incrementing cookie ->
        data on wire, prefixed with incrementing cookie ->
        data on wire, prefixed with incrementing cookie ->
        data on wire, prefixed with incrementing cookie ->

                        -> data hits NIC, cockie is index into CAM, exactly
                           like an ethernet switch indexes MAC addresses,
                           except that the value here is a physical page
                           address instead of an outgoing port

                        -> data hits NIC ....
                        -> data hits NIC ....
                        -> data hits NIC ....
                        -> data hits NIC ....
                        -> data hits NIC ....
                        -> data hits NIC .... we're at 75MB or so
                           
                           server goes off and locks down another 100MB

                            <- CTS another 100MB (this happens well before the
                           first 100MB is drained, so there are no bubbles,
                           the pipe is 100% full at all times)

I trust that the SGI guys, who know this stuff better than I, will correct
and problems with this description. I'm sure there are some, but I'm also
sure this is pretty close to how it works modulo flow control, security,
and other important bits.

What's important is that you notice two things:

    (a) this is what I call "polite networking". Normal networking is
        impolite, the packets show up on your doorstep uninvited. Here,
        the receiver is nicely asked if it is OK, and gets to set things
        up first. Just like in DMA.
    
    (b) There is some hardware on the NIC that translates from packets to
        physical page addresses, this happens inline, so the packet gets
        DMAed into memory as it is coming off the wire. Very low buffering
        requirements.

So to get back to your question, SCSI is a fine protocol to layer on top
of this. What STP is doing is giving you the illusion of large blocks in
a fabric that has small packets. So if SCSI wants to do a 1MB transfer,
it sees that over STP the same way it sees it over a local bus, as one
big DMA.

It should be obvious that this is a huge win over traditional networking,
right? If you want to go really fast, I don't know how else you can do
it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Feb 15 2000 - 21:00:25 EST