Re: [PATCH v3 1/2] RDMA/rxe: Update wqe_index for each wqe error completion

From: lizhijian@xxxxxxxxxxx
Date: Sun Jun 26 2022 - 23:42:15 EST




On 27/06/2022 05:51, Bob Pearson wrote:
> On 5/15/22 20:53, Li Zhijian wrote:
>> Previously, if user space keeps sending abnormal wqe, queue.prod will
>> keep increasing while queue.index doesn't. Once
>> queue.index==queue.prod in next round, req_next_wqe() will treat queue
>> as empty. In such case, no new completion would be generated.
>>
>> Update wqe_index for each wqe completion so that req_next_wqe() can get
>> next wqe properly.
>>
>> Signed-off-by: Li Zhijian <lizhijian@xxxxxxxxxxx>
>> ---
>> drivers/infiniband/sw/rxe/rxe_req.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
>> index a0d5e57f73c1..8bdd0b6b578f 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_req.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
>> @@ -773,6 +773,8 @@ int rxe_requester(void *arg)
>> if (ah)
>> rxe_put(ah);
>> err:
>> + /* update wqe_index for each wqe completion */
>> + qp->req.wqe_index = queue_next_index(qp->sq.queue, qp->req.wqe_index);
>> wqe->state = wqe_state_err
>> __rxe_do_task(&qp->comp.task);
>>
> This change looks plausible, but I am not sure if it will make a difference since the qp
> will get transitioned to the error state very shortly.
>
> In order for it to matter the requester must be a ways ahead of the completer in the send queue
> and someone be actively posting new wqes which will reschedule the requester. Currently it
> will fail on the same wqe again unless the error described above occurs but if we post a new valid
> wqe it will get executed even though we have detected an error that should have stopped the qp.
>
> It looks like the intent was to keep the qp in the non error state until all the old
> wqes get completed before making the transition.
Not really, My first intent was just let req_next_wqe() return wqe if the queue is not empty.
Since, currently if  rxe_requester() always goes to the error path for some reasons, req_next_wqe()
will becomes false empty at next round though the queue is almost full.

BTW, i will review your newly private patches

Thanks
Zhijian

> But we should disable the requester
> from processing new wqes in this case. That seems like a safer solution to the problem.
>
> Bob
>