RE: [PATCH v5 00/27] IB/Verbs: IB Management Helpers

From: Liran Liss
Date: Tue Apr 21 2015 - 20:10:30 EST


Hi Michael,

The spirit of this patch-set is great, but I think that we need to clarify some concepts.
Since this will affect the whole patch-set, I am laying out my concerns here instead.

A suggestion for the resulting management helpers is given below.
I believe the result would be much more coherent.
--Liran

In general
========

An ib_dev (or a port of) should be distinguished by 3 qualifiers:
- The link layer:
-- Ethernet (shared by iWARP, USNIC, and ROCE)
-- Infiniband

- The transport (*)
-- IBTA transport (shared by IB and ROCE)
-- iWARP transport
-- USNIC transport

(*) Transport means both:
- The L4 wire protocols (e.g., BTH+ headers of IBTA, optionally encapsulated by UDP in ROCEv2, or the iWARP stack)
- The transport semantics (for example, there are slight semantic differences between IBTA and iWARP)

- The node type (**)
-- CA
-- Switch
-- Router

(**) This has been extended to also encode the transport in the current code.
At least for user-space visible APIs, we might chose to leave this for backward compatibility, but we can consider cleaning up the kernel code.

So, I think that our "old-transport" below is just fine.
No need to change it (and you aren't, since it is currently implemented as a function).

The "new-transport" does not really exist, but is broken into several capability checks of the L4 transport, optionally with conditions on the link type.
I would remove the table below and tell what we really want to achieve:
==> move technology-specific feature-check logic out of the (multiple!) IB code components and various ULPs into per-feature helpers.


Detailed remarks
==============

1) The introduction of cap_*_*() stuff should have been introduced directly in patch 02/27.
This back-and-forth between rdma_ib_or_iboe() and cap_* is confusing and increases the number of patches in the patch-set.
Do this and remove patches 16-24.

2)The name rdma_tech_* is lame.
rdma_transport_*(), adhering to the above (*) remark, is much better.
For example, both IB and ROCE *do* use the same transport.

3) The name cap_* as it is used above is not accurate.
You use it to describe technology characteristics rather than extendable capabilities.
I would suggest having a single convention for all helpers, such as rdma_has_*() and rdma_is_*().
For example: cap_ib_smi() ==> rdma_has_smi().

4) Remove all capabilities that do not introduce any distinction in the current code.
We can add them as needed later.
This means remove patches:
- [PATCH v5 22/27] IB/Verbs: Use management helper cap_ipoib() â all IB devices support ipoib
- [PATCH v5 24/27] IB/Verbs: Use management helper cap_af_ib() â all IB devices support AF_IB.

On the other hand:
- rdma_has_multicast() makes sense, since iWARP doesnât support it.
- cap_ib_sa() might make sense to cut code even further in the CMA, since RoCE has a GSI but no SA.

5) Do no modify phys_state_show() in [PATCH v5 09/27] IB/Verbs: Reform IB-core verbs/uverbs_cmd/sysfs
It *is* the link layer!

6) Remove cap_read_multi_sge
It is not device/port feature, but a transport capability.
Use rdma_is_iwarp_transport() instead, or introduce a new transport flag in 'enum ib_device_cap_flags'.

7) Remove [PATCH v5 25/27] IB/Verbs: Use management helper cap_eth_ah().
Address handles that refer to Ethernet links always have Ethernet addressing.

In the CMA code, using rdma_tech_iboe() is just fine. This is how you define cap_eth_ah() anyway.
Currently, this patch just adds clutter.

8) Remove patch [PATCH v5 26/27] IB/Verbs: Clean up rdma_ib_or_iboe().
We do need a transport qualifier, as exemplified in comment 5) above, and for a complete clean model.
This is after renaming the function to rdma_is_ib_transport()...


Putting it all together
==================

We are left with the following helpers:
- rdma_is_ib_transport()
- rdma_is_iwarp_transport()
- rdma_is_usnic_transport()
- rdma_is_iboe()
- rdma_has_mad()
- rdma_has_smi()
- rdma_has_gsi() - complements smi; can be used by the mad code for clarity
- rdma_has_sa()
- rdma_has_cm()
- rdma_has_mcast()


> Subject: [PATCH v5 00/27] IB/Verbs: IB Management Helpers
>
>
> Since v4:
> * Thanks for the comments from Hal, Sean, Tom, Or Gerlitz, Jason,
> Roland, Ira and Steve :-) Please remind me if anything missed :-P
> * Fix logical issue inside 3#, 14#
> * Refine 3#, 4#, 5# with label 'free'
> * Rework 10# to stop using port 1 when port already assigned
>
> There are plenty of lengthy code to check the transport type of IB device, or
> the link layer type of it's port, but actually we are just speculating whether a
> particular management/feature is supported by the device/port.
>
> Thus instead of inferring, we should have our own mechanism for IB
> management capability/protocol/feature checking, several proposals below.
>
> This patch set will reform the method of getting transport type, we will now
> using query_transport() instead of inferring from transport and link layer
> respectively, also we defined the new transport type to make the concept
> more reasonable.
>
> Mapping List:
> node-type link-layer old-transport new-transport
> nes RNIC ETH IWARP IWARP
> amso1100 RNIC ETH IWARP IWARP
> cxgb3 RNIC ETH IWARP IWARP
> cxgb4 RNIC ETH IWARP IWARP
> usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP
> ocrdma IB_CA ETH IB IBOE
> mlx4 IB_CA IB/ETH IB IB/IBOE
> mlx5 IB_CA IB IB IB
> ehca IB_CA IB IB IB
> ipath IB_CA IB IB IB
> mthca IB_CA IB IB IB
> qib IB_CA IB IB IB
>
> For example:
> if (transport == IB) && (link-layer == ETH) will now become:
> if (query_transport() == IBOE)
>
> Thus we will be able to get rid of the respective transport and link-layer
> checking, and it will help us to add new protocol/Technology (like OPA) more
> easier, also with the introduced management helpers, IB management logical
> will be more clear and easier for extending.
>
> Highlights:
> The patch set covered a wide range of IB stuff, thus for those who are
> familiar with the particular part, your suggestion would be invaluable ;-)
>
> Patch 1#~15# included all the logical reform, 16#~25# introduced the
> management helpers, 26#~27# do clean up.
>
> Patches haven't been tested yet, we appreciate if any one who have these
> HW willing to provide his Tested-by :-)
>
> Doug suggested the bitmask mechanism:
> https://www.mail-archive.com/linux-
> rdma@xxxxxxxxxxxxxxx/msg23765.html
> which could be the plan for future reforming, we prefer that to be another
> series which focus on semantic and performance.
>
> This patch-set is somewhat 'bloated' now and it may be a good timing for
> staging, I'd like to suggest we focus on improving existed helpers and push
> all the further reforms into next series ;-)
>
> Proposals:
> Sean:
> https://www.mail-archive.com/linux-
> rdma@xxxxxxxxxxxxxxx/msg23339.html
> Doug:
> https://www.mail-archive.com/linux-
> rdma@xxxxxxxxxxxxxxx/msg23418.html
> https://www.mail-archive.com/linux-
> rdma@xxxxxxxxxxxxxxx/msg23765.html
> Jason:
> https://www.mail-archive.com/linux-
> rdma@xxxxxxxxxxxxxxx/msg23425.html
>
> Michael Wang (27):
> IB/Verbs: Implement new callback query_transport()
> IB/Verbs: Implement raw management helpers
> IB/Verbs: Reform IB-core mad/agent/user_mad
> IB/Verbs: Reform IB-core cm
> IB/Verbs: Reform IB-core sa_query
> IB/Verbs: Reform IB-core multicast
> IB/Verbs: Reform IB-ulp ipoib
> IB/Verbs: Reform IB-ulp xprtrdma
> IB/Verbs: Reform IB-core verbs/uverbs_cmd/sysfs
> IB/Verbs: Reform cm related part in IB-core cma/ucm
> IB/Verbs: Reform route related part in IB-core cma
> IB/Verbs: Reform mcast related part in IB-core cma
> IB/Verbs: Reserve legacy transport type in 'dev_addr'
> IB/Verbs: Reform cma_acquire_dev()
> IB/Verbs: Reform rest part in IB-core cma
> IB/Verbs: Use management helper cap_ib_mad()
> IB/Verbs: Use management helper cap_ib_smi()
> IB/Verbs: Use management helper cap_ib_cm()
> IB/Verbs: Use management helper cap_iw_cm()
> IB/Verbs: Use management helper cap_ib_sa()
> IB/Verbs: Use management helper cap_ib_mcast()
> IB/Verbs: Use management helper cap_ipoib()
> IB/Verbs: Use management helper cap_read_multi_sge()
> IB/Verbs: Use management helper cap_af_ib()
> IB/Verbs: Use management helper cap_eth_ah()
> IB/Verbs: Clean up rdma_ib_or_iboe()
> IB/Verbs: Cleanup rdma_node_get_transport()
>
> ---
> drivers/infiniband/core/agent.c | 4
> drivers/infiniband/core/cm.c | 26 +-
> drivers/infiniband/core/cma.c | 328 ++++++++++++---------------
> drivers/infiniband/core/device.c | 1
> drivers/infiniband/core/mad.c | 51 ++--
> drivers/infiniband/core/multicast.c | 18 -
> drivers/infiniband/core/sa_query.c | 41 +--
> drivers/infiniband/core/sysfs.c | 8
> drivers/infiniband/core/ucm.c | 5
> drivers/infiniband/core/ucma.c | 27 --
> drivers/infiniband/core/user_mad.c | 32 +-
> drivers/infiniband/core/uverbs_cmd.c | 6
> drivers/infiniband/core/verbs.c | 33 --
> drivers/infiniband/hw/amso1100/c2_provider.c | 7
> drivers/infiniband/hw/cxgb3/iwch_provider.c | 7
> drivers/infiniband/hw/cxgb4/provider.c | 7
> drivers/infiniband/hw/ehca/ehca_hca.c | 6
> drivers/infiniband/hw/ehca/ehca_iverbs.h | 3
> drivers/infiniband/hw/ehca/ehca_main.c | 1
> drivers/infiniband/hw/ipath/ipath_verbs.c | 7
> drivers/infiniband/hw/mlx4/main.c | 10
> drivers/infiniband/hw/mlx5/main.c | 7
> drivers/infiniband/hw/mthca/mthca_provider.c | 7
> drivers/infiniband/hw/nes/nes_verbs.c | 6
> drivers/infiniband/hw/ocrdma/ocrdma_main.c | 1
> drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 6
> drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 3
> drivers/infiniband/hw/qib/qib_verbs.c | 7
> drivers/infiniband/hw/usnic/usnic_ib_main.c | 1
> drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 6
> drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 2
> drivers/infiniband/ulp/ipoib/ipoib_main.c | 17 -
> include/rdma/ib_verbs.h | 204 +++++++++++++++-
> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 6
> net/sunrpc/xprtrdma/svc_rdma_transport.c | 51 +---
> 35 files changed, 584 insertions(+), 368 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at
> http://vger.kernel.org/majordomo-info.html