Re: [BUG] net: mellanox: mlx4: possible deadlock in mlx4_xdp_set() and mlx4_en_reset_config()

From: Tariq Toukan
Date: Wed Feb 09 2022 - 06:27:49 EST




On 2/7/2022 5:16 PM, Jia-Ju Bai wrote:
Hello,

My static analysis tool reports a possible deadlock in the mlx4 driver in Linux 5.16:


Hi Jia-Ju,
Thanks for your email.

Which static analysis tool do you use? Is it standard one?

mlx4_xdp_set()
  mutex_lock(&mdev->state_lock); --> Line 2778 (Lock A)
  mlx4_en_try_alloc_resources()
    mlx4_en_alloc_resources()
      mlx4_en_destroy_tx_ring()
        mlx4_qp_free()
          wait_for_completion(&qp->free); --> Line 528 (Wait X)

The refcount_dec_and_test(&qp->refcount)) in mlx4_qp_free() pairs with refcount_set(&qp->refcount, 1); in mlx4_qp_alloc.
mlx4_qp_event increases and decreasing the refcount while running qp->event(qp, event_type); to protect it from being freed.


mlx4_en_reset_config()
  mutex_lock(&mdev->state_lock); --> Line 3522 (Lock A)
  mlx4_en_try_alloc_resources()
    mlx4_en_alloc_resources()
      mlx4_en_destroy_tx_ring()
        mlx4_qp_free()
          complete(&qp->free); --> Line 527 (Wake X)

When mlx4_xdp_set() is executed, "Wait X" is performed by holding "Lock A". If mlx4_en_reset_config() is executed at this time, "Wake X" cannot be performed to wake up "Wait X" in mlx4_xdp_set(), because "Lock A" has been already hold by mlx4_xdp_set(), causing a possible deadlock.

I am not quite sure whether this possible problem is real and how to fix it if it is real.
Any feedback would be appreciated, thanks :)


Not possible.
These are two different qps, maintaining two different instances of refcount and complete, following the behavior I described above.

Best wishes,
Jia-Ju Bai

Thanks,
Tariq