Re: Efficient IPC mechanism on Linux

From: Andrea Arcangeli
Date: Wed Sep 10 2003 - 13:01:37 EST


On Wed, Sep 10, 2003 at 07:39:17PM +0200, Martin Konold wrote:
> Am Wednesday 10 September 2003 06:59 pm schrieb Andrea Arcangeli:
>
> Hi,
>
> > design that I'm suggesting. Obviously lots of apps are already using
> > this design and there's no userspace API simply because that's not
> > needed.
>
> HPC people have investigated High performance IPC many times basically it
> boils down to:
>
> 1. Userspace is much more efficient than kernel space. So efficient
> implementions avoid kernel space even for message transfers over networks
> (e.g. DMA directly to userspace).
>
> 2. The optimal protocol to use and the number of copies to do is depending on
> the message size.
>
> Small messages are most efficiently handled with a single/dual copy (short
> protocol / eager protocol) and large messages are more efficient with
> single/zero copy techniques (get protocol) depending if a network is involved
> or not.
>
> Traditionally in a networked environment single copy means PIO and zero copy
> means DMA when using network hardware.
>
> The idea is while DMA has much higher bandwidth than PIO DMA is more expensive
> to initiate than PIO. So DMA is only useful for large messages.

agreed.

>
> In the local SMP case there do exist userspace APIs like MPI which can do

btw, so far we were only discussing IPC in a local box (UP or SMP or
NUMA) w/o networking involved. Luca's currnet implementation as well was
only working locally.

> single copy Message passing at memory speed in pure userspace since many
> years.
>
> The following PDF documents a measurement where the communication between two
> processes running on different CPUs in an SMP system is exactly the memcpy
> bandwidth for large messages using a single copy get protocol.
>
> http://ipdps.eece.unm.edu/1999/pc-now/takahash.pdf
>
> Numbers from a Dual P-II-333, Intel 440LX (100MB/s memcpy)
>
> eager get
> min. Latency µs 8.62 9.98
> max Bandwidth MB/s 48.03 100.02
> half bandwith point KB 0.3 2.5
>
> You can easily see that the eager has better latency for very short messages
> and that the get has a max bandwidth beeing equivalent of a memcpy (single
> copy).
>
> True zero copy has unlimited (sigh!) bandwidth within an SMP and does not
> really make sense in contrast to a network.

if you can avoid to enter kernel, you'd better do that, because entering
kernel will take much more time than the copy itself.

with the shm/futex approch you can also have a ring buffer to handle
parallelism better while it's at the same time zerocopy and enterely
userspace based in the best case (thought that's not the common case).

thanks,

Andrea

/*
* If you refuse to depend on closed software for a critical
* part of your business, these links may be useful:
*
* rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.5/
* rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.4/
* http://www.cobite.com/cvsps/
*
* svn://svn.kernel.org/linux-2.6/trunk
* svn://svn.kernel.org/linux-2.4/trunk
*/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/