RE: Make pipe data structure be a circular list of pages, rather than

From: Robert White
Date: Wed Jan 19 2005 - 21:10:34 EST


Howdy,

The below (sorry about the Outlook style includes, work _requires_ outlook 8-() is
why I previously mentioned the STREAMS implementation from SVR4 (and so Solaris etc).

Every interface point has a pointer to its peer, a pair of queues and a put()
routine. Every message is in two or more pieces, the message header and the message
buffer/buffers (read real data pages).

So the "participating interface" requirement is really that the device be able to
expose a data structure consisting of four pointers that their peer can point to.

The "message types" for the messages passing around are all contained in the message
header structures. These headers have all the reference counting and
disposal/cleanup hook procedures (pointers) along with the references to the actual
data. This means that you could have a stock pool (from the slab cache) of these
headers for moving plain data around, and "magical devices" could use specialty
message headers with the necessary disposal elements or necessary "extra data" in
them.

All the devices in any kind of chain/mux/pipe/whatever arrangement pass messages to
their peers by invoking peer->put(message) [in OOD parlance, really probably
(*peer)(peer,message) in straight C]. And modules can be induced into the
chain/mux/pipe/whtever at runtime. The target put() procedure decides to copy or
reference-increment the message and put it on it's receive queue or process it
immediately. At no time does any member of a STREAMS structure need to know any
details about the workings of their peers.

In the abstract, most communications would have message headers referring to blocks
of normal memory. Some devices could send out messages that reference their PCI
mapped memory or whatever, and all _that_ device would have to do to mesh with the
system is to replace the cleanup function pointer with one appropriate to its need.
When the message has gone as far as it is going to go, its reference count drops to
zero and it gets cleaned up.

Unget can be done with some queue insertion/manipulation. There is no pull() as
every target is either hot-processed or queued. Flow control takes place when a
put() procedure returns a no-space-for-message indication.

If you absolutely _must_ know if the data was actually consumed by the target (and I
could argue that this is virtually never the case as things get lost in error-induced
pipe tear-down all the time) it would only be necessary to add a "cursor" member to
the message header and, at destruction time, see if the cursor had been advanced to
"use up" the data.

One could even have message headers that know they are dealing with

Now the message header is Linus' "impedance matcher" interface, transport is really
just about knowing your peer, calling put(), and doing reference counting.

There were a lot of (IMHO) mistakes in the STREAMS design, but the programming
document (previously referenced in this thread) covers a lot of the same message
typing and management questions that are being raised here.

While I can see that the fit between Linux and SVR4 STREAMS wouldn't be identical, a
good bit of the pipe-fitting (to pun 8-) and operation definitions would be.

(I'll go back to lurking now... 8-)

Rob White,
Casabyte, Inc.


-----Original Message-----
From: linux-kernel-owner@xxxxxxxxxxxxxxx [mailto:linux-kernel-owner@xxxxxxxxxxxxxxx]
On Behalf Of linux@xxxxxxxxxxx
lm@xxxxxxxxxxxx wrote:

The main infrastructure hassle you need to support this *universally*
is the unget() on "character devices" like pipes and network sockets.
Ideally, it would be some generic buffer front end that could be used
by the device for normal data as well as the special case.

Ooh. Need to think. If there's a -EIO problem on one of the file
descriptors, how does the caller know which one? That's an argument for
separate pull and push (although the splice() library routine still
has the problem). Any suggestions? Does userland need to fall
back on read()/write() for a byte?



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/