Re: [PATCH for-next 3/3] RDMA/rtrs: Avoid use-after-free in rtrs_clt_rdma_cm_handler

From: Guoqing Jiang
Date: Tue Apr 11 2023 - 21:15:13 EST




On 4/11/23 09:33, Zhijian Li (Fujitsu) wrote:

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index 4c8f42e46e2f..760a7eb51297 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -2074,6 +2074,7 @@ static int create_cm(struct rtrs_clt_con *con)
          rtrs_err(s, "Failed to resolve address, err: %d\n", err);
          goto destroy_cm;
      }
+again:
      /*
       * Combine connection status and session events. This is needed
       * for waiting two possible cases: cm_err has something meaningful
@@ -2083,10 +2084,15 @@ static int create_cm(struct rtrs_clt_con *con)
              clt_path->state_wq,
              con->cm_err || clt_path->state != RTRS_CLT_CONNECTING,
              msecs_to_jiffies(RTRS_CONNECT_TIMEOUT_MS));
-    if (err == 0 || err == -ERESTARTSYS) {
-        if (err == 0)
-            err = -ETIMEDOUT;
-        /* Timedout or interrupted */
+    if (err == -ERESTARTSYS) {
+        /* interrupted,
+         * try again to avoid the in-flight rtrs_clt_rdma_cm_handler()
+         * getting a use-after-free
+         */
+        goto again;
+    } else if (err == 0) {
+        err = -ETIMEDOUT;
+        /* Timedout */
          goto errr;
      }
Can event handler still be triggered in case of timeout?
I have never hit such race.

But it is still possible with the theory.

And I guess either stop_cm -> rdma_disconnect or destroy_cm -> rdma_destroy_id
should prevent this kind of racy issue.
In practice, they are possible that rtrs_clt_rdma_cm_handler() is in-flight during
'either stop_cm -> rdma_disconnect or destroy_cm -> rdma_destroy_id'. rtrs_clt_rdma_cm_handler() and
cm's cleanup path need to hold mutex_lock(&con->con_mutex), once cm's cleanup path get this lock first
rtrs_clt_rdma_cm_handler has to sleep, when rtrs_clt_rdma_cm_handler is wakeup again, some resources has been
freed by cm's cleanup path.

First, stop_cm doesn't need to hold &con->con_mutex. But rtrs_clt_rdma_cm_handler
is called from rdma core layer which need  id_priv->handler_mutexinstead of con_mutex
I thinnk. Also RTRS has similar behavior as nvme host rdma, ib_srp and iser_verb.

@Jinpu/Haris, can we move destroy_cm right after stop_cm?*

*Thanks,
Guoqing