Re: scheduling bottom halves/task queues

David Miller (davem@twiddle.net)
Wed, 7 Apr 1999 15:12:43 -0700


From: pwhiting@fury.ittc.ukans.edu
Date: Wed, 7 Apr 1999 16:41:17 -0500

To sum up, he would like to have the bottom halves execute in the
context of the process that gave rise to their existence (when such
a process exists.) Matching up seems to be the hard task here. Do
you see any reasons this might be easy/hard, doable/impossible,
good idea/bad idea?

I see a hint here that you probably have considered the issue, but let
me explicityly state it. Many instances of BH processing do work for
no particular task at all, IP routing is a good example of this.
Other examples are IP fragmentation, IP masquerading... actually many
networking instances have this quality.

Some thought has been put into solving your problem backwards, for the
case of TCP. The idea is to make all of the TCP protocol processing
run in the context of the process which owns the socket. This will
fast path most of the time because even though TCP is asynchronous to
the process blocking, most of the time when packets arrive the process
will be blocked in a read/write call of some sort.

When the process isn't, we have a pool of kernel threads which can
perform the protocol processing on behalf of the socket owning
process.

So the path might look like:

ethernet device interrupt
push to IP input
push to TCP input
find socket
add packet to socket backlog queue
if(socket_task is blocked in TCP read/write)
wake up socket_task with high priority
else
pick and wake up TCP input kernel thread

>From ethernet driver to finishing the wakeup could probably be done in
approximately 500 processor cycles on an UltraSparc, perhaps faster if
finely tuned. It would increase hardware interrupt latencies
slightly, but remove the BH processing completely.

The real incentive behind this scheme is it would allow us to do the
copy+checksum operation on packet data receives in TCP direct and once
to userspace. Right now it must be done in two passes, the data is
read once in the BH processing to verify the checksum in TCP input,
and later the data is read again as it is copied into the users
buffer.

WARNING: Totally unproven technology, we don't even know if we can
make this scheme work with all of the peculiarities of TCP's state
machine, timers, etc. But this is what research is all about, isn't
it :-)

Later,
David S. Miller
davem@redhat.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/