RE: [RFC PATCH v2 1/7] IB/mlx5: Change ib_umem_odp_map_dma_single_page() to retain umem_mutex

From: Daisuke Matsuda (Fujitsu)
Date: Fri Nov 18 2022 - 00:50:15 EST


On Thu, Nov 17, 2022 10:21 PM Li, Zhijian wrote:
> On 11/11/2022 17:22, Daisuke Matsuda wrote:
> > ib_umem_odp_map_dma_single_page(), which has been used only by the mlx5
> > driver, holds umem_mutex on success and releases on failure. This
> > behavior is not convenient for other drivers to use it, so change it to
> > always retain mutex on return.
> >
> > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@xxxxxxxxxxx>
> > ---
> > drivers/infiniband/core/umem_odp.c | 8 +++-----
> > drivers/infiniband/hw/mlx5/odp.c | 4 +++-
> > 2 files changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
> > index e9fa22d31c23..49da6735f7c8 100644
> > --- a/drivers/infiniband/core/umem_odp.c
> > +++ b/drivers/infiniband/core/umem_odp.c
> > @@ -328,8 +328,8 @@ static int ib_umem_odp_map_dma_single_page(
> > *
> > * Maps the range passed in the argument to DMA addresses.
> > * The DMA addresses of the mapped pages is updated in umem_odp->dma_list.
> > - * Upon success the ODP MR will be locked to let caller complete its device
> > - * page table update.
> > + * The umem mutex is locked in this function. Callers are responsible for
> > + * releasing the lock.
> > *
>
>
> > * Returns the number of pages mapped in success, negative error code
> > * for failure.
> > @@ -453,11 +453,9 @@ int ib_umem_odp_map_dma_and_lock(struct ib_umem_odp *umem_odp, u64 user_virt,
> > break;
> > }
> > }
> > - /* upon success lock should stay on hold for the callee */
> > +
> > if (!ret)
> > ret = dma_index - start_idx;
> > - else
> > - mutex_unlock(&umem_odp->umem_mutex);
> >
> > out_put_mm:
> > mmput_async(owning_mm);
> > diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
> > index bc97958818bb..a0de27651586 100644
> > --- a/drivers/infiniband/hw/mlx5/odp.c
> > +++ b/drivers/infiniband/hw/mlx5/odp.c
> > @@ -572,8 +572,10 @@ static int pagefault_real_mr(struct mlx5_ib_mr *mr, struct ib_umem_odp *odp,
> > access_mask |= ODP_WRITE_ALLOWED_BIT;
> >
> > np = ib_umem_odp_map_dma_and_lock(odp, user_va, bcnt, access_mask, fault);
> > - if (np < 0)
> > + if (np < 0) {
> > + mutex_unlock(&odp->umem_mutex);
> > return np;
> > + }
>
> refer to the comments of ib_umem_odp_map_dma_and_lock:
> 334 * Returns the number of pages mapped in success, negative error
> code
> 335 * for failure.
>
> I don't think it's correct to release the lock in all failure case, for
> example when it reaches below error path.

Thank you. That's certainly true. I will fix this in v3.
Probably I will drop this patch and make changes on rxe side instead.

>
> 346 int ib_umem_odp_map_dma_and_lock(struct ib_umem_odp *umem_odp, u64
> user_virt,
> 347 u64 bcnt, u64 access_mask, bool
> fault)
> 348 __acquires(&umem_odp->umem_mutex)
>
> 349 {
>
> 350 struct task_struct *owning_process = NULL;
>
> 351 struct mm_struct *owning_mm = umem_odp->umem.owning_mm;
>
> 352 int pfn_index, dma_index, ret = 0, start_idx;
>
> 353 unsigned int page_shift, hmm_order, pfn_start_idx;
>
> 354 unsigned long num_pfns, current_seq;
>
> 355 struct hmm_range range = {};
>
> 356 unsigned long timeout;
>
> 357
>
> 358 if (access_mask == 0)
>
> 359 return -EINVAL; <<<<< no lock is hold yet
>
> 360
>
> 361 if (user_virt < ib_umem_start(umem_odp) ||
>
> 362 user_virt + bcnt > ib_umem_end(umem_odp))
>
> 363 return -EFAULT; <<<<< no lock is hold yet
>
>
> Further more, you changed public API's the behavior, do it matter for
> other out-of-tree drivers which is using it, i'm not familiar with this,
> maybe kernel has no restriction on it ?

Yes, they are just left behind. The developers have to modify the driver
by themselves if they want to keep it compatible with the upstream.
I think this is one of the general reasons why companies contribute
their works to OSS communities. They can lower the cost of maintaining
their software while getting benefits from new features.

Cf. The Linux Kernel Driver Interface
https://www.kernel.org/doc/html/latest/process/stable-api-nonsense.html

Thanks,
Daisuke

>
>
> >
> > /*
> > * No need to check whether the MTTs really belong to this MR, since