Re: [PATCH v3 0/2] RDMA/rxe: Fix no completion event issue

From: Li, Zhijian
Date: Sat Jun 25 2022 - 23:30:05 EST



on 6/25/2022 8:59 PM, Yanjun Zhu wrote:

在 2022/6/7 16:32, lizhijian@xxxxxxxxxxx 写道:
Hi Json & Yanjun


I know there are still a few regressions on RXE, but i do wish you could take some time to review these *simple and bugfix* patches
They are not related to the regressions.

Now there are some problems from Redhat and other Linux Vendors.

We had better focus on these problems.

+ Xiao
I do believe regression is high priority,  and I'm very willing to contribute our efforts to improve the stability of RXE :)
Yang,Xiao and me tried to reproduce the issues in maillist and we also tried to review the their corresponding patches.
However actually we didn't find a unified way something like bugzilla to maintain the issues and their status, and most of
them are not reproduced by our local environment. So it's a bit hard for us to review/verify the patches especially for the
large/complicate patch if we don't have the use cases.

BTW, IMO we shouldn't stop reviewing other fixes expect recent regressions.

Zhijian


Zhu Yanjun



Thanks
Zhijian


On 16/05/2022 09:53, Li Zhijian wrote:
Since RXE always posts RDMA_WRITE successfully, it's observed that
no more completion occurs after a few incorrect posts. Actually, it
will block the polling. we can easily reproduce it by the below pattern.

a. post correct RDMA_WRITE
b. poll completion event
while true {
    c. post incorrect RDMA_WRITE(wrong rkey for example)
    d. poll completion event <<<< block after 2 incorrect RDMA_WRITE posts
}


Li Zhijian (2):
    RDMA/rxe: Update wqe_index for each wqe error completion
    RDMA/rxe: Generate error completion for error requester QP state

   drivers/infiniband/sw/rxe/rxe_req.c | 12 +++++++++++-
   1 file changed, 11 insertions(+), 1 deletion(-)