Re: [PATCH] soc: qcom: qmi: Signal the txn completion after releasing the mutex

From: Pavan Kondeti
Date: Mon Jul 31 2023 - 11:19:37 EST


On Mon, Jul 31, 2023 at 06:37:55PM +0530, Praveenkumar I wrote:
> txn is in #1 stack
>
> Worker #1 Worker #2
> ******** *********
>
> qmi_txn_wait(txn) qmi_handle_message
> | |
> | |
> wait_for_complete(txn->complete) ....
> | mutex_lock(txn->lock)
> | |
> mutex_lock(txn->lock) |
> ..... complete(txn->lock)
> | mutex_unlock(txn->lock)
> |
> mutex_unlock(txn->lock)
>
> In this case above, while #2 is doing the mutex_unlock(txn->lock),
> in between releasing lock and doing other lock related wakeup, #2 gets
> scheduled out. As a result #1, acquires the lock, unlocks, also
> frees the txn also (where the lock resides)
>
> Now #2, gets scheduled again and tries to do the rest of the lock
> related wakeup, but lock itself is invalid because txn itself is gone.
>
> Fixing this, by doing the mutex_unlock(txn->lock) first and then
> complete(txn->lock) in #2
>
> Fixes: 3830d0771ef6 ("soc: qcom: Introduce QMI helpers")
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Sricharan Ramabadhran <quic_srichara@xxxxxxxxxxx>
> Signed-off-by: Praveenkumar I <quic_ipkumar@xxxxxxxxxxx>
> ---
> drivers/soc/qcom/qmi_interface.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/soc/qcom/qmi_interface.c b/drivers/soc/qcom/qmi_interface.c
> index 78d7361fdcf2..92e29db97359 100644
> --- a/drivers/soc/qcom/qmi_interface.c
> +++ b/drivers/soc/qcom/qmi_interface.c
> @@ -505,12 +505,13 @@ static void qmi_handle_message(struct qmi_handle *qmi,
> pr_err("failed to decode incoming message\n");
>
> txn->result = ret;
> - complete(&txn->completion);
> } else {
> qmi_invoke_handler(qmi, sq, txn, buf, len);
> }
>
> mutex_unlock(&txn->lock);
> + if (txn->dest && txn->ei)
> + complete(&txn->completion);
> } else {
> /* Create a txn based on the txn_id of the incoming message */
> memset(&tmp_txn, 0, sizeof(tmp_txn));

What happens in a remote scenario where the waiter gets timed out at the
very same time you are releasing the mutex but before calling
complete()? The caller might end up freeing txn structure and it results
in the same issue you are currently facing.

Thanks,
Pavan