RE: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCPportsfrom the host TCP port space.

From: Felix Marti
Date: Sun Aug 19 2007 - 13:49:25 EST




> -----Original Message-----
> From: general-bounces@xxxxxxxxxxxxxxxxxxxxx [mailto:general-
> bounces@xxxxxxxxxxxxxxxxxxxxx] On Behalf Of David Miller
> Sent: Sunday, August 19, 2007 12:24 AM
> To: sean.hefty@xxxxxxxxx
> Cc: netdev@xxxxxxxxxxxxxxx; rdreier@xxxxxxxxx;
> general@xxxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> jeff@xxxxxxxxxx
> Subject: Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate
> PS_TCPportsfrom the host TCP port space.
>
> From: "Sean Hefty" <sean.hefty@xxxxxxxxx>
> Date: Sun, 19 Aug 2007 00:01:07 -0700
>
> > Millions of Infiniband ports are in operation today. Over 25% of
the
> top 500
> > supercomputers use Infiniband. The formation of the OpenFabrics
> Alliance was
> > pushed and has been continuously funded by an RDMA customer - the US
> National
> > Labs. RDMA technologies are backed by Cisco, IBM, Intel, QLogic,
> Sun, Voltaire,
> > Mellanox, NetApp, AMD, Dell, HP, Oracle, Unisys, Emulex, Hitachi,
> NEC, Fujitsu,
> > LSI, SGI, Sandia, and at least two dozen other companies. IDC
> expects
> > Infiniband adapter revenue to triple between 2006 and 2011, and
> switch revenue
> > to increase six-fold (combined revenues of 1 billion).
>
> Scale these numbers with reality and usage.
>
> These vendors pour in huge amounts of money into a relatively small
> number of extremely large cluster installations. Besides the folks
> doing nuke and whole-earth simulations at some government lab, nobody
> cares. And part of the investment is not being done wholly for smart
> economic reasons, but also largely publicity purposes.
>
> So present your great Infiniband numbers with that being admitted up
> front, ok?
>
> It's relevance to Linux as a general purpose operating system that
> should be "good enough" for %99 of the world is close to NIL.
>
> People have been pouring tons of money and research into doing stupid
> things to make clusters go fast, and in such a way that make zero
> sense for general purpose operating systems, for ages. RDMA is just
> one such example.
[Felix Marti] Ouch, and I believed linux to be a leading edge OS,
scaling from small embedded systems to hundreds of CPUs and hence
I assumed that the same 'scalability' applies to the network subsystem.

>
> BTW, I find it ironic that you mention memory bandwidth as a retort,
> as Roland's favorite stateless offload devil, TSO, deals explicity
> with lowering the per-packet BUS bandwidth usage of TCP. LRO
> offloading does likewise.

[Felix Marti] Aren't you confusing memory and bus BW here? - RDMA
enables DMA from/to application buffers removing the user-to-kernel/
kernel-to-user memory copy with is a significant overhead at the
rates we're talking about: memory copy at 20Gbps (10Gbps in and 10Gbps
out) requires 60Gbps of BW on most common platforms. So, receiving and
transmitting at 10Gbps with LRO and TSO requires 80Gbps of system
memory BW (which is beyond what most systems can do) whereas RDMA can
do with 20Gbps!

In addition, BUS improvements are really not significant (nor are buses
the bottleneck anymore with wide availability of PCI-E >= x8); TSO
avoids
the DMA of a bunch of network headers... a typical example of stateless
offload - improving performance by a few percent while offload
technologies
provide system improvements of hundreds of percent.

I know that you don't agree that TSO has drawbacks, as outlined by
Roland,
but its history showing something else: the addition of TSO took a fair
amount of time and network performance was erratic for multiple kernel
revisions and the TSO code is sprinkled across the network stack. It is
an
example of an intrusive 'improvement' whereas Steve (who started this
thread) is asking for a relatively small change (decoupling the 4-tuple
allocation from the socket). As Steve has outlined, your refusal of the
change requires RDMA users to work around the issue which pushes the
issue to the end-users and thus slowing down the acceptance of the
technology leading to a chicken-and-egg problem: you only care if there
are lots of users but you make it hard to use the technology in the
first
place, clever ;)

> _______________________________________________
> general mailing list
> general@xxxxxxxxxxxxxxxxxxxxx
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/