[PATCH] RDMA/device: Fix a race between mad_client and cm_client init

From: Shifeng Li
Date: Mon Jan 01 2024 - 22:53:11 EST


The mad_client will be initialized in enable_device_and_get(), while the
devices_rwsem will be downgraded to a read semaphore. There is a window
that leads to the failed initialization for cm_client, since it can not
get matched mad port from ib_mad_port_list, and the matched mad port will
be added to the list after that.

mad_client | cm_client
------------------|--------------------------------------------------------
ib_register_device|
enable_device_and_get
down_write(&devices_rwsem)
xa_set_mark(&devices, DEVICE_REGISTERED)
downgrade_write(&devices_rwsem)
|
|ib_cm_init
|ib_register_client(&cm_client)
|down_read(&devices_rwsem)
|xa_for_each_marked (&devices, DEVICE_REGISTERED)
|add_client_context
|cm_add_one
|ib_register_mad_agent
|ib_get_mad_port
|__ib_get_mad_port
|list_for_each_entry(entry, &ib_mad_port_list, port_list)
|return NULL
|up_read(&devices_rwsem)
|
add_client_context|
ib_mad_init_device|
ib_mad_port_open |
list_add_tail(&port_priv->port_list, &ib_mad_port_list)
up_read(&devices_rwsem)
|

Fix it by using the devices_rwsem write semaphore to protect the mad_client
init flow in enable_device_and_get().

Fixes: d0899892edd0 ("RDMA/device: Provide APIs from the core code to help unregistration")
Cc: Shifeng Li <lishifeng1992@xxxxxxx>
Signed-off-by: Shifeng Li <lishifeng@xxxxxxxxxxxxxx>
---
drivers/infiniband/core/device.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 67bcea7a153c..85782786993d 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1315,12 +1315,6 @@ static int enable_device_and_get(struct ib_device *device)
down_write(&devices_rwsem);
xa_set_mark(&devices, device->index, DEVICE_REGISTERED);

- /*
- * By using downgrade_write() we ensure that no other thread can clear
- * DEVICE_REGISTERED while we are completing the client setup.
- */
- downgrade_write(&devices_rwsem);
-
if (device->ops.enable_driver) {
ret = device->ops.enable_driver(device);
if (ret)
@@ -1337,7 +1331,7 @@ static int enable_device_and_get(struct ib_device *device)
if (!ret)
ret = add_compat_devs(device);
out:
- up_read(&devices_rwsem);
+ up_write(&devices_rwsem);
return ret;
}

--
2.25.1