Efficient edge-triggered event interface

From: Jamie Lokier (lk@tantalophile.demon.co.uk)
Date: Wed Oct 25 2000 - 14:36:10 EST

Next message: Jamie Lokier: "Re: kqueue microbenchmark results"
Previous message: Trond Myklebust: "Re: nfsv3d wrong truncates over 4G"
In reply to: Simon Kirby: "Re: kqueue microbenchmark results"
Next in thread: Jonathan Lemon: "Re: kqueue microbenchmark results"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This text is about how edge-triggered events can work, but they must be
the right kind of edges if they are to be efficient. With suggestions.

Simon Kirby wrote:
> What happens at "wait until output is ready for writing then goto 6"?
> You mean you would stop the main loop to wait for a single client to
> unclog?

I mean stop *just this state machine*. Go back to the main loop to
process the others. What I wrote was a simplification.

The precise condition can be complicated. You probably want to process
some more requests from the same client even while the output is
blocked, so that you have a few responses ready when it unblocks. Also,
you may wish to base the decision of how many requests/responses are
taking up your memory on more global things, like how full are your
buffers for other clients.

> Wouldn't you just do this? ->
>
> 1. Wait for event (read and write queued). Event occurs: Incoming
> data available.
> 2. Read a block.
> 3. Process block just read: Does it contain a full request? If not,
> queue, goto 2, munge together. If no more data, queue beginning
> of request, if any, and goto 1.
> 4. Walk over available requests in block just read. Process.

Care in step 4. Processing all the requests may generate more data than
you're prepared to buffer.

> 5. Attempt to write response, if any.
> 6. Attempted write: Did it all get out? If not, queue waiting
> writable data and goto 1 to wait for a write event.

I'd attempt to write() responses as each request is processed, if the
responses are large ones. E.g. files over http. For small ones,
multiple responses are coalesced in the per-connection output buffer.

Now, about events models.

Writes and edge-triggered events
--------------------------------

The write event would be edge-triggered by write becoming possible after
it wasn't possible. I.e., the transition from when write() would return
EGAIN to when it wouldn't. That's fine for both our servers I think.

Just as with select/poll, the server can choose to retain a flag saying
whether the socket can be written to, to avoid redundant write() system
calls. However it doesn't have to do this, edge-triggered events will
work without it.

Reads and edge-triggered events
-------------------------------

It doesn't matter to receive unnecessary events, but of course we prefer
to avoid redundant events. We _must_ receive events whenever data becomes
available if the application's state machine decides to wait for one.

Choice of two obvious rules:

1. Event whenever input buffer switches from empty to non-empty.
2. Event whenever new data arrives.

Rule 1 is the sane, obvious one that corresponds with poll() semantics.
Unfortunately it has an overhead. As an application, I have to ensure I
don't ever wait unless I _know_ that the input buffer is empty.

That means I must not wait on a descriptor until read() has returned
EAGAIN. Seems reasonable -- but it's actually more system calls than a
select/poll loop.

Why? With select/poll, if I call read() and get a short result, it's
good to assume the next read() call is *likely* to return EAGAIN. In
other words I will go back to the main loop and wait until select/poll
reports POLLIN, rather than trying a second read().

That's select/read/write per transaction. Rule 1 would force me to do
kevent/read/read/write per transaction. When you're down to 3 system
calls, one more is significant.

Rule 2 guarantees to wake me up, but it has a big overhead. If more and
more new data arrives but I don't want to read it right now, either I'm
going to receive events which I ignore, or I must call bind_event()
twice for the interval when I'm not interested. I receive lots of
events because I'm calling kevent() to process _other_ descriptors many
times in my main loop before I'm ready to read this one.

This is unnecessary overhead -- I already know there is more data.

So, is there a rule 3? One which fits my model (minimum system calls)
is:

3. Condition 1 + also when read() does not consume the whole buffer.

Pros:

- Minimum system calls
- Avoids "NOTPOLLIN" events, as someone suggested.
- Interface is still beautiful and simple.

I'll leave it to our top API designer to decide.

enjoy,
-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Next message: Jamie Lokier: "Re: kqueue microbenchmark results"
Previous message: Trond Myklebust: "Re: nfsv3d wrong truncates over 4G"
In reply to: Simon Kirby: "Re: kqueue microbenchmark results"
Next in thread: Jonathan Lemon: "Re: kqueue microbenchmark results"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Tue Oct 31 2000 - 21:00:16 EST