Re: [PATCH 1/1] net/mlx5: update debug log level for remote access error syndromes

From: Aru
Date: Tue Nov 01 2022 - 03:24:12 EST


Hi Leon,

On 10/25/22 10:48 PM, Leon Romanovsky wrote:
On Tue, Oct 25, 2022 at 02:22:01AM -0700, Arumugam Kolappan wrote:
The mlx5 driver dumps the entire CQE buffer by default for few syndromes.
Some syndromes are expected due to the application behavior [ex:
MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR, MLX5_CQE_SYNDROME_REMOTE_OP_ERR and
MLX5_CQE_SYNDROME_LOCAL_PROT_ERR]. Hence, for these syndromes, the patch
converts the log level from KERN_WARNING to KERN_DEBUG. This enables the
application to get the CQE buffer dump by changing to KERN_DEBUG level
as and when needed.

Suggested-by: Leon Romanovsky <leon@xxxxxxxxxx>
Signed-off-by: Arumugam Kolappan <aru.kolappan@xxxxxxxxxx>
---
drivers/infiniband/hw/mlx5/cq.c | 30 ++++++++++++++++++++++--------
1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index be189e0..d665129 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -267,10 +267,25 @@ static void handle_responder(struct ib_wc *wc, struct mlx5_cqe64 *cqe,
wc->wc_flags |= IB_WC_WITH_NETWORK_HDR_TYPE;
}
-static void dump_cqe(struct mlx5_ib_dev *dev, struct mlx5_err_cqe *cqe)
+static void dump_cqe(struct mlx5_ib_dev *dev, struct mlx5_err_cqe *cqe,
+ struct ib_wc *wc, int dump)
{
- mlx5_ib_warn(dev, "dump error cqe\n");
- mlx5_dump_err_cqe(dev->mdev, cqe);
+ const char *level;
+
+ if (!dump)
+ return;
+
+ mlx5_ib_warn(dev, "WC error: %d, Message: %s\n", wc->status,
+ ib_wc_status_msg(wc->status));
Aren't you interested "to hide" this print too? Right now, it will
be printed without relation to your "dump" variable value.

Thanks for pointing out this. Yes. This line also needs to be covered by debug log level.

Current existing functions ("mlx5_ib_warn(), mlx5_ib_err() ...) do not accept log-level as argument.

So I've added a new fn: mlx5_ib_log(..) which takes log-level as the first argument and print it accordingly.


The updated patch will be posted in the next email for your review.

+
+ if (dump == 1)
+ level = KERN_WARNING;
+
+ if (dump == 2)
+ level = KERN_DEBUG;
Please change dump_cqe() arguments to receive level directly, so you
will set "dump = KERN_DEBUG" and not not "dump = 2" in
mlx5_handle_error_cqe().

Yes. This is also taken care in the updated patch.
Also I've updated the subject line as you mentioned in the other email.


Thanks

Aru


Thanks