Re: [PATCH v2 for-next] RDMA/core: Get IB width and speed from netdev

From: Mark Zhang
Date: Mon Jun 19 2023 - 05:16:15 EST


On 6/19/2023 2:20 PM, Junxian Huang wrote:


On 2023/6/12 1:46, Leon Romanovsky wrote:
On Sat, Jun 03, 2023 at 02:38:33PM +0800, Junxian Huang wrote:
From: Haoyue Xu <xuhaoyue1@xxxxxxxxxxxxx>

Previously, there was no way to query the number of lanes for a network
card, so the same netdev_speed would result in a fixed pair of width and
speed. As network card specifications become more diverse, such fixed
mode is no longer suitable, so a method is needed to obtain the correct
width and speed based on the number of lanes.

I'm sorry but I didn't understand the problem statement. Can you please
provide an example of configuration that will give different results
before this patch and after?


I'll give examples with 20G and 200G netdevs respectively.

20G:
Before this patch, regardless of the actual number of lanes, the width and
speed displayed in ibv_devinfo would be always fixed:
active_width: 4X
active_speed: 5 Gbps
After this patch, there will be different combinations of width and speed
according to the number of lanes. For example, for a 20G netdev whose number
of lanes is 2, the width and speed displayed in ibv_devinfo will be:
active_width: 2X
active_speed: 10 Gbps

200G:
Before this patch, netdevs with netdev_speed more than 40G cannot get a right
width and speed in ibv_devinfo. Only the default result would be displayed:
active_width: 4X
active_speed: 25 Gbps
After this patch, taking an example with 4 lanes, the displayed results will be:
active_width: 4X
active_speed: 50 Gbps

Can we use ib_query_port() instead of __ethtool_get_link_ksettings()?
width = attr.active_width;
speed = ib_speed_enum_to_int(attr.active_speed) *
ib_width_enum_to_int(attr.active_width);



This patch retrieves netdev lanes and speed from net_device and
translates them to IB width and speed. Also, add a generic function
to translating netdev speed to IB speed.

Signed-off-by: Haoyue Xu <xuhaoyue1@xxxxxxxxxxxxx>
Signed-off-by: Luoyouming <luoyouming@xxxxxxxxxx>
Signed-off-by: Junxian Huang <huangjunxian6@xxxxxxxxxxxxx>
---
drivers/infiniband/core/verbs.c | 19 +++++++++++++++++--
include/rdma/ib_verbs.h | 26 ++++++++++++++++++++++++++
2 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index b99b3cc283b6..35f1b670600a 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1880,11 +1880,19 @@ int ib_modify_qp_with_udata(struct ib_qp *ib_qp, struct ib_qp_attr *attr,
}
EXPORT_SYMBOL(ib_modify_qp_with_udata);
+static void ib_get_width_and_speed(u32 netdev_speed, u32 lanes,
+ u16 *speed, u8 *width)
+{
+ *width = ib_int_to_ib_width(lanes);
+ *speed = ib_eth_to_ib_speed(netdev_speed / lanes);
+}
+
int ib_get_eth_speed(struct ib_device *dev, u32 port_num, u16 *speed, u8 *width)
{
int rc;
u32 netdev_speed;
struct net_device *netdev;
+ bool cap_link_lanes_supported;
struct ethtool_link_ksettings lksettings;
if (rdma_port_get_link_layer(dev, port_num) != IB_LINK_LAYER_ETHERNET)
@@ -1896,16 +1904,23 @@ int ib_get_eth_speed(struct ib_device *dev, u32 port_num, u16 *speed, u8 *width)
rtnl_lock();
rc = __ethtool_get_link_ksettings(netdev, &lksettings);
+ cap_link_lanes_supported = netdev->ethtool_ops->cap_link_lanes_supported;
rtnl_unlock();
dev_put(netdev);
if (!rc && lksettings.base.speed != (u32)SPEED_UNKNOWN) {
netdev_speed = lksettings.base.speed;
+ if (cap_link_lanes_supported && lksettings.lanes) {

According to the documentation cap_link_lanes_supported defines if
number of lanes can be supplied by user and I would expect from
__ethtool_get_link_ksettings() to get right numbers after it was
changed.

Thanks


I'm sorry but I didn't quite understand. Do you mean the critical section of
rtnl_lock() here should be expanded to make sure getting the right number of
lanes?

Junxian

+ ib_get_width_and_speed(netdev_speed, lksettings.lanes,
+ speed, width);
+ return 0;
+ }
} else {
netdev_speed = SPEED_1000;
- pr_warn("%s speed is unknown, defaulting to %u\n", netdev->name,
- netdev_speed);
+ if (rc)
+ pr_warn("%s speed is unknown, defaulting to %u\n",
+ netdev->name, netdev_speed);
}
if (netdev_speed <= SPEED_1000) {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 1e7774ac808f..7dc926ec7fee 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -552,6 +552,18 @@ static inline int ib_width_enum_to_int(enum ib_port_width width)
}
}
+static inline int ib_int_to_ib_width(u32 lanes)
+{
+ switch (lanes) {
+ case 1: return IB_WIDTH_1X;
+ case 2: return IB_WIDTH_2X;
+ case 4: return IB_WIDTH_4X;
+ case 8: return IB_WIDTH_8X;
+ case 12: return IB_WIDTH_12X;
+ default: return IB_WIDTH_1X;
+ }
+}
+
enum ib_port_speed {
IB_SPEED_SDR = 1,
IB_SPEED_DDR = 2,
@@ -563,6 +575,20 @@ enum ib_port_speed {
IB_SPEED_NDR = 128,
};
+static inline int ib_eth_to_ib_speed(u32 speed)
+{
+ switch (speed) {
+ case SPEED_2500: return IB_SPEED_SDR;
+ case SPEED_5000: return IB_SPEED_DDR;
+ case SPEED_10000: return IB_SPEED_FDR10;
+ case SPEED_14000: return IB_SPEED_FDR;
+ case SPEED_25000: return IB_SPEED_EDR;
+ case SPEED_50000: return IB_SPEED_HDR;
+ case SPEED_100000: return IB_SPEED_NDR;
+ default: return IB_SPEED_SDR;
+ }
+}
+
enum ib_stat_flag {
IB_STAT_FLAG_OPTIONAL = 1 << 0,
};
--
2.30.0