Re: [v5 PATCH] block: introduce block_rq_error tracepoint

From: Yang Shi
Date: Wed Feb 02 2022 - 12:48:10 EST


On Thu, Jan 27, 2022 at 10:18 AM Yang Shi <shy828301@xxxxxxxxx> wrote:
>
> On Thu, Jan 27, 2022 at 1:53 AM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
> >
> > On Wed, Jan 26, 2022 at 10:51:53AM -0800, Yang Shi wrote:
> > > + __entry->dev = rq->q->disk ? disk_devt(rq->q->disk) : 0;
> > > + __assign_str(name, rq->q->disk ? rq->q->disk->disk_name : "?");
> >
> > None f the other tracepoints has the disk name, why does this one need
> > it? And if you add it please avoid the overly long line.
>
> I guess the disk name was added to ease some handling in userspace
> tools. But if all other tracepoints don't have disk name shown, I
> think I'd better follow the convention. I did overlook this when I
> ported this patch. Will remove it.
>
> >
> > > + __entry->sector = blk_rq_pos(rq);
> > > + __entry->nr_sector = nr_bytes >> 9;
> > > + __entry->error = blk_status_to_errno(error);
> >
> > This still converts the block status to an errno.
>
> I may misunderstand your comments. I just followed what
> block_rq_complete tracepoint does. Or the linux-block community is
> converting all tracepoints to show blk status code instead of
> conventional errno?
>
> And the userspace tool doesn't know blk status code and still expects
> the conventional errno. For example, rasdaemon reads block_rq_complete
> events now and have the below:
>
> static const struct {
> int error;
> const char *name;
> } blk_errors[] = {
> { -EOPNOTSUPP, "operation not supported error" },
> { -ETIMEDOUT, "timeout error" },
> { -ENOSPC, "critical space allocation error" },
> { -ENOLINK, "recoverable transport error" },
> { -EREMOTEIO, "critical target error" },
> { -EBADE, "critical nexus error" },
> { -ENODATA, "critical medium error" },
> { -EILSEQ, "protection error" },
> { -ENOMEM, "kernel resource error" },
> { -EBUSY, "device resource error" },
> { -EAGAIN, "nonblocking retry error" },
> { -EREMCHG, "dm internal retry error" },
> { -EIO, "I/O error" },
> };

Gently ping. Should I print blk status code instead of errno for this
trace point? But I really don't get why. And block_rq_complete
tracepoint does:

TP_fast_assign(
__entry->dev = rq->q->disk ? disk_devt(rq->q->disk) : 0;
__entry->sector = blk_rq_pos(rq);
__entry->nr_sector = nr_bytes >> 9;
__entry->error = blk_status_to_errno(error); <===
convert blk status code to errno

blk_fill_rwbs(__entry->rwbs, rq->cmd_flags);
__get_str(cmd)[0] = '\0';
),

>
> This patch aims to add block_rq_err tracepoint to replace
> block_rq_complete in rasdaemon.