Re: [PATCH] RDMA/device: Fix a race between mad_client and cm_client init

From: Shifeng Li
Date: Fri Jan 05 2024 - 03:30:52 EST


On 2024/1/4 20:37, Jason Gunthorpe wrote:
On Thu, Jan 04, 2024 at 02:48:14PM +0800, Shifeng Li wrote:

The root cause is that mad_client and cm_client may init concurrently
when devices_rwsem write semaphore is downgraded in enable_device_and_get() like:

That can't be true, the module loader infrastructue ensures those two
things are sequential.


I'm a bit confused how the module loader infrastructue ensures that mad_client.add() and
cm_client.add() are sequential. Could you explain in more detail please?

We know that the ib_cm driver and mlx5_ib driver can load concurrently.

Thanks.

You are trying to say that the post-client fixup stuff will still see
the DEVICE_REGISTERED before it reaches the clients_rwsem lock?

That probably just says the clients_rwsem should be obtained before
changing the DEVICE_STATE too :\

Jason