Re: [PATCH rdma-rc] RDMA/mlx5: Fix dereg mr flow for kernel MRs

From: Thorsten Leemhuis
Date: Mon Jan 03 2022 - 08:16:07 EST


Hi, this is your Linux kernel regression tracker speaking.

On 03.01.22 10:51, Leon Romanovsky wrote:
> On Wed, Dec 22, 2021 at 10:51:58AM +0800, Tony Lu wrote:
>> On Tue, Dec 21, 2021 at 11:46:41AM +0200, Leon Romanovsky wrote:
>>> From: Maor Gottlieb <maorg@xxxxxxxxxx>
>>>
>>> The cited commit moved umem into the union, hence
>>> umem could be accessed only for user MRs. Add udata check
>>> before access umem in the dereg flow.
>>>
>>> Fixes: f0ae4afe3d35 ("RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow")
>>> Tested-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
>>> Signed-off-by: Maor Gottlieb <maorg@xxxxxxxxxx>
>>> Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxx>
>>> ---
>>> drivers/infiniband/hw/mlx5/mlx5_ib.h | 2 +-
>>> drivers/infiniband/hw/mlx5/mr.c | 4 ++--
>>> drivers/infiniband/hw/mlx5/odp.c | 4 ++--
>>> 3 files changed, 5 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
>>
>> This patch was tested and works for me in our environment for SMC. It
>> wouldn't panic when release link and call ib_dereg_mr.
>>
>> Tested-by: Tony Lu <tonylu@xxxxxxxxxxxxxxxxx>
>
> Thanks, unfortunately, this patch is incomplete.

Could you be a bit more verbose and give a status update? It's hard to
follow from the outside. But according to the "Fixes: f0ae4afe3d35"
above this was supposed to fix a regression introduced in v5.16-rc5 that
was also reported here:
https://lore.kernel.org/linux-rdma/9974ea8c-f1cb-aeb4-cf1b-19d37536894a@xxxxxxxxxxxxxxxxx/

Commit f0ae4afe3d35 in fact was also backported to v5.15.y and might
cause trouble there as well.

Should it maybe simply be reverted (and reapplied with all fixes later)
in mainline (5.16 will likely be released in 6 days!) and v5.15.y?

Ciao, Thorsten (wearing his 'Linux kernel regression tracker' hat)

P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
on my table. I can only look briefly into most of them. Unfortunately
therefore I sometimes will get things wrong or miss something important.
I hope that's not the case here; if you think it is, don't hesitate to
tell me about it in a public reply, that's in everyone's interest.

BTW, I have no personal interest in this issue, which is tracked using
regzbot, my Linux kernel regression tracking bot
(https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
this mail to get things rolling again and hence don't need to be CC on
all further activities wrt to this regression.